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Preface 


This is a translation of the (slightly revised) second German edition of our book 
“Lineare Algebra”, published by Springer Spektrum in 2015. Our general view 
of the field of Linear Algebra and the approach to it that we have chosen in this 
book were already described in our Preface to the First German Edition, published 
by Vieweg+Teubner in 2012. In a nutshell, our exposition is matrix-oriented, and 
we aim at presenting a rather complete theory (including all details and proofs), 
while keeping an eye on the applicability of the results. Many of them, though 
appearing very theoretical at first sight, are of an immediate practical relevance. In 
our experience, the matrix-oriented approach to Linear Algebra leads to a better 
intuition and a deeper understanding of the abstract concepts, and therefore sim- 
plifies their use in real-world applications. 

Starting from basic mathematical concepts and algebraic structures we develop 
the classical theory of matrices, vectors spaces, and linear maps, culminating in the 
proof of the Jordan canonical form. In addition to the characterization of important 
special classes of matrices or endomorphisms, the last chapters of the book are 
devoted to special topics: Matrix functions and systems of differential equations, the 
singular value decomposition, the Kronecker product, and linear matrix equations. 
These chapters can be used as starting points of more advanced courses or seminars 
in Applied Linear Algebra. 

Many people helped us with the first two German editions and this English edition 
of the book. In addition to those mentioned in the Preface to the First German 
Edition, we would like to particularly thank Olivier Sète, who carefully worked 
through the entire draft of the second edition and gave numerous comments, as well 
as Leonhard Batzke, Carl De Boor, Sadegh Jokar, Robert Luce, Christian Mehl, 
Helia Niroomand Rad, Jan Peter Schafermeier, Daniel Wachsmuth, and Gisbert 


Vi Preface 


Wustholz. Thanks also to the staff of Springer Spektrum, Heidelberg, and 
Springer-Verlag, London, for their support and assistance with editorial aspects of 
this English edition. 


Berlin Jorg Liesen 
July 2015 Volker Mehrmann 


Preface to the First German Edition 


Mathematics is the instrument that links theory and practice, thinking and observing; 
it establishes the connecting bridge and builds it stronger and stronger. This is why our 
entire culture these days, as long as it is concerned with understanding and harnessing 
nature, has Mathematics as its foundation. ' 


This assessment of the famous mathematician David Hilbert (1862—1943) is even 
more true today. Mathematics is found not only throughout the classical natural 
sciences, Biology, Chemistry and Physics, its methods have become indispensable 
in Engineering, Economics, Medicine, and many other areas of life. This continuing 
mathematization of the world is possible because of the transversal strength of 
Mathematics. The abstract objects and operations developed in Mathematics can be 
used for the description and solution of problems in numerous different situations. 

While the high level of abstraction of modern Mathematics continuously 
increases its potential for applications, it represents a challenge for students. This is 
particularly true in the first years, when they have to become familiar with a lot of 
new and complicated terminology. In order to get students excited about mathe- 
matics and capture their imagination, it is important for us teachers of basic courses 
such as Linear Algebra to present Mathematics as a living science in its global 
context. The short historical notes in the text and the list of some historical papers at 
the end of this book show that Linear Algebra is the result of a human endeavor. 

An important guideline of the book is to demonstrate the immediate practical 
relevance of the developed theory. Right in the beginning we illustrate several 
concepts of Linear Algebra in everyday life situations. We discuss mathematical 
basics of the search engine Google and of the premium rate calculations of car 


‘Das Instrument, welches die Vermittlung bewirkt zwischen Theorie und Praxis, zwischen 
Denken und Beobachten, ist die Mathematik; sie baut die verbindende Brücke und gestaltet sie 
immer tragfahiger. Daher kommt es, dass unsere ganze gegenwartige Kultur, soweit sie auf der 
geistigen Durchdringung und Dienstbarmachung der Natur beruht, ihre Grundlage in der 
Mathematik findet.” 


Vil 
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insurances. These and other applications will be investigated in later chapters using 
theoretical results. Here the goal is not to study the concrete examples or their 
solutions, but the presentation of the transversal strength of mathematical methods 
in the Linear Algebra context. 

The central object for our approach to Linear Algebra is the matrix, which we 
introduce early on, immediately after discussing some of the basic mathematical 
foundations. Several chapters deal with some of their most important properties, 
before we finally make the big step to abstract vector spaces and homomorphisms. 
In our experience the matrix-oriented approach to Linear Algebra leads to a better 
intuition and a deeper understanding of the abstract concepts. 

The same goal should be reached by the MATLAB-Minutes’ that are scattered 
throughout the text and that allow readers to comprehend the concepts and results 
via computer experiments. The required basics for these short exercises are intro- 
duced in the Appendix. Besides the MATLAB-Minutes there are a large number of 
classical exercises, which just require a pencil and paper. 

Another advantage of the matrix-oriented approach to Linear Algebra is given 
by the simplifications when transferring theoretical results into practical algorithms. 
Matrices show up wherever data are systematically ordered and processed, which 
happens in almost all future job areas of bachelor students in the mathematical 
sciences. This has also motivated the topics in the last chapters of this book: matrix 
functions, the singular value decomposition, and the Kronecker product. 

Despite many comments on algorithmic and numerical aspects, the focus in this 
book is on the theory of Linear Algebra. The German physicist Gustav Robert 
Kirchhoff (1824—1887) is attributed to have said: 


A good theory is the most practical thing there is.” 


This is exactly how we view our approach to the field. 

This book is based on our lectures at TU Chemnitz and TU Berlin. We would 
like to thank all students, co-workers, and colleagues who helped in preparing and 
proofreading the manuscript, in the formulation of exercises, and with the content 
of lectures. Our special thanks go to André Gaul, Florian Gofler, Daniel Kreßner, 
Robert Luce, Christian Mehl, Matthias Pester, Robert Polzin, Timo Reis, Olivier 
Sete, Tatjana Stykel, Elif Topcu, Wolfgang Wulling, and Andreas Zeiser. 

We also thank the staff of the Vieweg+Teubner Verlag and, in particular, Ulrike 
Schmickler-Hirzebruch, who strongly supported this endeavor. 


Berlin Jorg Liesen 
July 2011 Volker Mehrmann 


?MATLAB” trademark of The MathWorks Inc. 
*“Fine gute Theorie ist das Praktischste, was es gibt.” 
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Chapter 1 
Linear Algebra in Every Day Life 


One has to familiarize the student with actual questions from applications, so that he learns 
to deal with real world problems.! 


Lothar Collatz (1910-1990) 


1.1 The PageRank Algorithm 


The PageRank algorithm is a method to assess the “importance” of documents with 
mutual links, such as web pages, on the basis of the link structure. It was developed 
by Sergei Brin and Larry Page, the founders of Google Inc., at Stanford University 
in the late 1990s. The basic idea of the algorithm is the following: 

Instead of counting links, PageRank essentially interprets a link of page A to page 
B as a vote of page A for page B. PageRank then assesses the importance of a page 
by the number of received votes. PageRank also considers the importance of the 
page that casts the vote, since votes of some pages have a higher value, and thus also 
assign a higher value to the page they point to. Important pages will be rated higher 
and thus lead to a higher position in the search results.” 

Let us describe (model) this idea mathematically. Our presentation uses ideas from 
the article [BryL06]. For a given set of web pages, every page k will be assigned 
an importance value x, > 0. A page k is more important than a page j if x, > xj. 
If a page k has a link to a page j, we say that page j has a backlink from page k. 
In the above description these backlinks are the votes. As an example, consider the 
following link structure: 


«Man muss den Lernenden mit konkreten Fragestellungen aus den Anwendungen vertraut machen, 
dass er lernt, konkrete Fragen zu behandeln.” 


Translation of a text found in 2010 on http://www.google.de/corporate/tech.html. 
© Springer International Publishing Switzerland 2015 l 
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Here the page 1 has links to the pages 2, 3 and 4, and a backlink from page 3. 

The easiest approach to define importance of web pages is to count its backlinks; 
the more votes are cast for a page, the more important the page is. In our example 
this gives the importance values 


H= l i = 5, =). w=: 


The pages 2 and 4 are thus the most important pages, and they are equally important. 

However, the intuition and also the above description from Google suggests that 
backlinks from important pages are more important for the value of a page than those 
from less important pages. This idea can be modeled by defining x; as the sum of all 
importance values of the backlinks of the page k. In our example this results in four 
equations that have to be satisfied simultaneously, 


X1 = X3, X2 = X1 + x3 + X4, X3 = X1 TX4, X4 = X1 + X2 + x3. 


A disadvantage of this approach is that it does not consider the number of links 
of the pages. Thus, it would be possible to (significantly) increase the importance of 
a page just by adding links to that page. In order to avoid this, the importance values 
of the backlinks in the PageRank algorithm are divided by the number of links of the 
corresponding page. This creates a kind of “internet democracy”: Every page can 
vote for other pages, where in total it can cast one vote. In our example this gives the 
equations 


X X 
C a ae (1.1) 


These are four equations for the four unknowns, and all equations are linear, i.e., 


the unknowns occur only in first power. In Chap.6 we will see how to write the 
equations in (1.1) in form of a linear system of equations. Analyzing and solving 
such systems is one of the most important tasks of Linear Algebra. The example of 
the PageRank algorithm shows that Linear Algebra presents a powerful modeling 


3The term “linear” originates from the Latin word “linea”, which means “(straight) line”, and 
“linearis” means “consisting of (straight) lines”. 
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tool: We have turned the real world problem of assessing the importance of web 
pages into a problem of Linear Algebra. This problem will be examined further in 
Sect. 8.3. 

For completeness, we mention that a solution for the four unknowns (computed 
with MATLAB and rounded to the second significant digit) is given by 


X1 = 0.14, X2 = 0.54, X3 = 0.41, X4 = 0.72. 


Thus, page 4 is the most important one. It is possible to multiply the solution, 1.e., the 
importance values xg, by a positive constant. Such a multiplication or scaling is often 
advantageous for computational methods or for the visual display of the results. For 
example, the scaling could be used to give the most important page the value 1.00. 
A scaling is allowed, since it does not change the ranking of the pages, which is the 
essential information provided by the PageRank algorithm. 


1.2 No Claim Discounting in Car Insurances 


Insurance companies compute the premiums for their customers on the basis of the 
insured risk: the higher the risk, the higher the premium. It is therefore important to 
identify the factors that lead to higher risk. In the case of a car insurance these factors 
include the number of miles driven per year, the distance between home and work, 
the marital status, the engine power, or the age of the driver. Using such information, 
the company calculates the initial premium. 

Usually the best indicator for future accidents, and hence future insurance claims, 
is the number of accidents of the individual customer in the past, 1.e., the claims 
history. In order to incorporate this information into the premium rates, insurers 
establish a system of risk classes, which divide the customers into homogeneous risk 
groups with respect to their previous claims history. Customers with fewer accidents 
in the past get a discount on their premium. This approach is called a no claims 
discounting scheme. 

For a mathematical model of this scheme we need a set of risk classes and a 
transition rule for moving between the classes. At the end of a policy year, the 
customer may move to a different class depending on the claims made during the 
year. The discount is given in percent of the premium in the initial class. As a simple 
example we consider four risk classes, 


Ci Cy C3 C4 
% discount! O 10 20 40 


and the following transition rules: 


e No accident: Step up one class (or stay in C4). 
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e One accident: Step back one class (or stay in C1). 
e More than one accident: Step back to class C, (or stay in C1). 


Next, the insurance company has to estimate the probability that a customer who 
is in the class C; in this year will move to the class C;. This probability is denoted 
by pij. Let us assume, for simplicity, that the probability of exactly one accident for 
every customer is 0.1, i.e., 10%, and the probability of two or more accidents for 
every customer is 0.05, i.e., 5%. (Of course, in practice the insurance companies 
determine these probabilities in dependence of the classes.) 

For example, a customer in the class Cı will stay in C; in case of at least one 
accident. This happens with the probability 0.15, so that pı = 0.15. A customer in 
Cı has no accident with the probability 0.85, so that pı2 = 0.85. There is no chance 
to move from C; to C3 or C; in the next year, so that p13 = p14 = 0.00. In this way 
we obtain 16 values p;;j, i, 7 = 1, 2, 3,4, which we can arrange in a4 x 4 matrix as 
follows: 


Pil Pi2 P13 P14 0.15 0.85 0.00 0.00 

P21 P22 P23 Pa| _ 0.15 0.00 0.85 0.00 (1.2) 
P31 P32 P33 P34 0.05 0.10 0.00 0.85 | 
P41 P42 P43 P44 0.05 0.00 0.10 0.85 


All entries of this matrix are nonnegative real numbers, and the sum of all entries in 
each row is equal to 1.00, 1.e., 


Pi + Pi2 + pi3 + pi4 = 1.00 foreach i = 1, 2,3, 4. 


Such a matrix is called row-stochastic. 

The analysis of matrix properties is a central topic of Linear Algebra that is 
developed throughout this book. As in the example with the PageRank algorithm, 
we have translated a practical problem into the language of Linear Algebra, and we 
can now study it using Linear Algebra techniques. This example of premium rates 
will be discussed further in Example 4.7. 


1.3 Production Planning in a Plant 


The production planning in a plant has to consider many different factors, in par- 
ticular commodity prices, labor costs, and available capital, in order to determine a 
production plan. We consider a simple example: 

A company produces the products P; and P2. If x; units of the product P; are 
produced, where i = 1, 2, then the pair (x1, x2) is called a production plan. Suppose 
that the raw materials and labor for the production of one unit of the product P; 
cost a1; and az; Euros, respectively. If bı Euros are available for the purchase of raw 
materials and b2 Euros for the payment of labor costs, then a production plan must 
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satisfy the constraint inequalities 


Q11X1 + aj2X2 < bı, 


Az, X1 + Ar2X2 < b2. 


If a production plan satisfies these constraints, it is called feasible. Let p; be the profit 
from selling one unit of product P;. Then the goal is to determine a production plan 
that maximizes the profit function 


(x1, X2) = Pix + p22. 


How can we find this maximum? 
The two equations 


Q41X1 + 412X2 = bı and agi x; + anx = bz 


describe straight lines in the coordinate system that has the variables xı and x2 on its 
axes. These two lines form boundary lines of the feasible production plans, which are 
“below” the lines; see the figure below. Note that we also must have x; > 0, since we 
cannot produce negative units of a product. For planned profits y;, i = 1,2,3,..., 
the equations pıxı + p2x2 = y; describe parallel straight lines in the coordinate 
system; see the dashed lines in the figure. If x; and x2 satisfy pıxı + pox2 = y;, then 
(x1, x2) = yi. The profit maximization problem can now be solved by moving the 
dashed lines until one of them reaches the corner with the maximal y: 


a T2 


üiiti tarta = bi 


42121 + a222 = b2 








\ \ . optimal plan 
feasible plans : 
\ \ 


T + 
\ y 








In case of more variables we cannot draw such a simple figure and obtain the 
solution “graphically”. But the general idea of finding a corner with the maximum 
profit is still the same. This is an example of a linear optimization problem. As before, 
we have formulated a real world problem in the language of Linear Algebra, and we 
can use mathematical methods for its solution. 


1.4 Predicting Future Profits 


The prediction of profits or losses of a company is a central planning instrument of 
economics. Analogous problems arise in many areas of political decision making, 
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for example in budget planning, tax estimates or the planning of new infrastructures. 
We consider a specific example: 

In the four quarters of a year a company has profits of 10, 8, 9, 11 million Euros. 
The board now wants to predict the future profits development on the basis of these 
values. Evidence suggests, that the profits behave linearly. If this was true, then 
the profits would form a straight line y(t) = at + 8 that connects the points 
(1, 10), (2, 8), (3, 9), (4, 11) in the coordinate system having “time” and “profit” 
as its axes. This, however, does neither hold in this example nor in practice. There- 
fore one tries to find a straight line that deviates “‘as little as possible” from the given 
points. One possible approach is to choose the parameters a and ( in order to mini- 
mize the sum of the squared distances between the given points and the straight line. 
Once the parameters a and (3 have been determined, the resulting line y(t) can be 
used for estimating or predicting the future profits, as illustrated in the following 


figure: 
profit (7) 


time 





The determination of the parameters a and ( that minimize a sum of squares is 
called a least squares problem. We will solve least squares problems using meth- 
ods of Linear Algebra in Example 12.16. The approach itself is sometimes called a 
parameter identification. In Statistics, the modeling of given data (here the company 
profits) using a linear predictor function (here y(t) = at + p) is known as linear 
regression. 


1.5 Circuit Simulation 


The current development of electronic devices is very rapid. In short intervals, nowa- 
days often less than a year, new models of laptops or mobile phones have to be issued 
to the market. To achieve this, continuously new generations of computer chips have 
to be developed. These typically become smaller and more powerful, and naturally 
should use as little energy as possible. An important factor in this development is 
to plan and simulate the chips virtually, i.e., in the computer and without producing 
a physical prototype. This model-based planning and optimization of products is a 
central method in many high technology areas, and it is based on modern mathemat- 
Ics. 
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Usually, the switching behavior of a chip is modeled by a mathematical system 
consisting of differential and algebraic equations that describe the relation between 
currents and voltages. Without going into details, consider the following circuit: 


Vr(t) Vz (t) 








In this circuit description, Vs(t) is the given input current at time t, and the 
characteristic values of the components are R for the resistor, L for the inductor, and 
C for the capacitor. The functions for the potential differences at the three components 
are denoted by Vr(t), Vi (t), and Vc(t); I(t) is the current. 

Applying the Kirchhoff laws* of electrical engineering leads to the following 
system of linear equations and differential equations that model the dynamic behavior 
of the circuit: 


Fi, 
dt 
TER 
dt i 
R I = Vr, 


Vet Ver Ve = Ve. 


In this example it is easy to solve the last two equations for Vz and Vr, and hence 
to obtain a system of differential equations 


a es piy 
vo Ll L Dro 
N, 
| <n g~ 


for the functions 7 und Vc. We will discuss and solve this system in Example 17.13. 

This simple example demonstrates that for the simulation of a circuit a system 
of linear differential equations and algebraic equations has to be solved. Modern 
computer chips in industrial practice require solving such systems with millions 
of differential-algebraic equations. Linear Algebra is one of central tools for the 
theoretical analysis of such systems as well as the development of efficient solution 
methods. 


+Gustav Robert Kirchhoff (1824-1887). 


Chapter 2 
Basic Mathematical Concepts 


In this chapter we introduce the mathematical concepts that form the basis for the 
developments in the following chapters. We begin with sets and basic mathematical 
logic. Then we consider maps between sets and their most important properties. 
Finally we discuss relations and in particular equivalence relations on a set. 


2.1 Sets and Mathematical Logic 


We begin our development with the concept of a set and use the following definition 
of Cantor.! 


Definition 2.1 A setisacollection M of well determined and distinguishable objects 
x of our perception or our thinking. The objects are called the elements of M. 


The objects x in this definition are well determined, and therefore we can uniquely 
decide whether x belongs to a set M or not. We write x € M if x is an element of the 
set M, otherwise we write x ¢ M. Furthermore, the elements are distinguishable, 
which means that all elements of M are (pairwise) distinct. 

If two objects x and y are equal, then we write x = y, otherwise x Æ y. For 
mathematical objects we usually have to give a formal definition of equality. As an 
example consider the equality of sets; see Definition 2.2 below. 

We describe sets with curly brackets { } that contain either a list of the elements, 
for example 


{red, yellow, green}, {1,2,3,4}, {2,4,6,...}, 


'Georg Cantor (1845-1918), one of the founders of set theory. Cantor published this definition in 
the journal “Mathematische Annalen” in 1895. 


© Springer International Publishing Switzerland 2015 9 
J. Liesen and V. Mehrmann, Linear Algebra, Springer Undergraduate 
Mathematics Series, DOI 10.1007/978-3-3 19-24346-7_2 


10 2 Basic Mathematical Concepts 
or a defining property, for example 


{x | x 1S a positive even number}, 


{x | x is a person owning a bike}. 


Some of the well known sets of numbers are denoted as follows: 


DES 2 e355 N, (the natural numbers), 

No: HO. 29024 (the natural numbers including zero), 
Z={...,—2,—-1,0,1,2,...} (the integers), 

Q = {x |x =a/b with ae Z and b € N} (the rational numbers), 

IR = {x | x is areal number} (the real numbers). 


The construction and characterization of the real numbers R is usually done in an 
introductory course in Real Analysis. 

To describe a set via its defining property we formally write {x | P(x)}. Here 
P is a predicate which may hold for an object x or not, and P(x) is the assertion 
“P holds for x”. 

In general, an assertion is a statement that can be classified as either “true” or 
“false”. For instance the statement “The set N has infinitely many elements” is true. 
The sentence “Tomorrow the weather will be good” is not an assertion, since the 
meaning of the term “good weather” is unclear and the weather prediction in general 
is uncertain. 

The negation of an assertion A is the assertion “not A”, which we denote by ~A. 
This assertion is true if and only if A is false, and false if and only if A is true. For 
instance, the negation of the true assertion “The set N has infinitely many elements” 
is given by “The set N does not have infinitely many elements” (or “The set N has 
finitely many elements”), which is false. 

Two assertions A and B can be combined via logical compositions to a new 
assertion. The following is a list of the most common logical compositions, together 
with their mathematical short hand notation: 





Composition Notation Wording 
conjunction A A and B 
disjunction V Aor B 
implication = A implies B 
If A then B 


A is a sufficient condition for B 

B is a necessary condition for A 
equivalence < A and B are equivalent 

A is true if and only 1f B is true 

A is necessary and sufficient for B 

B is necessary and sufficient for A 
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For example, we can write the assertion “x is a real number and x is negative” as 
x E IRAx <0. Whether an assertion that is composed of two assertions A and B is 
true or false, depends on the logical values of A and B. We have the following table 
of logical values (“t and “f” denote true and false, respectively): 





For example, the assertion A ^ B is true only when A and B are both true. The 
assertion A = B is false only when A is true and B is false. In particular, if A is 
false, then A => B is true, independent of the logical value of B. 

Thus, 3 < 5 => 2 < 4 is true, since 3 < 5 and 2 < 4 are both true. But 
3 < 5 => 2 > 4is false, since 2 > 4 is false. On the other hand, the assertions 
4 <2 => 3 > 5and4 < 2 => 3 < 5 are both true, since 4 < 2 is false. 

In the following we often have to prove that certain implications A = B are true. 
As the table of logical values shows and the example illustrates, we then only have to 
prove that under the assumption that A is true the assertion B is true as well. Instead 
of “Assume that A is true” we will often write “Let A hold”. 

It is easy to see that 


(A = B) & (7B > >A). 


(As an exercise create the table of logical values for ~B = —A and compare it with 

the table for A = B.) The truth of A = B can therefore be proved by showing that 

the truth of —B implies the truth of —A, i.e., that “B is false” implies “A is false”. 

The assertion ~B => —A is called the contraposition of the assertion A = B and 

the conclusion from A = B to ~B = —A is called proof by contraposition. 
Together with assertions we also often use so-called quantifiers: 





Quantifier Notation Wording 
universal y For all 
existential J There exists 


Now we return to set theory and introduce subsets and the equality of sets. 


Definition 2.2 Let M, N be sets. 


(1) M is called a subset of N , denoted by M C N, if every element of M is also an 
element of N. We write M g N, if this does not hold. 

(2) M and N are called equal, denoted by M = N,if M C N and N C M. We 
write M Æ N is this does not hold. 
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(3) M is called a proper subset of N, denoted by M C N, if both M C N and 
M Æ N hold. 


Using the notation of mathematical logic we can write this definition as follows: 


()MCN S$ (Wx: xEMaxeNn). 
(2) M=N 8 (MCNANCWM). 
(33) MCN & (MCNAMEZEN). 


The assertion on the right side of the equivalence in (1) reads as follows: For all 
objects x the truth of x € M implies the truth of x € N. Or shorter: For all x, if 
x € M holds, then x € N holds. 

A very special set is the set with no elements, which we define formally as follows. 


Definition 2.3 The set Ø := {x | x Æ x} is called the empty set. 


The notation “:=” means is defined as. We have introduced the empty set by a 
defining property: Every object x with x Æ x is any element of Ø. This cannot hold 
for any object, and hence Ø does not contain any element. A set that contains at least 
one element is called nonempty. 


Theorem 2.4 For every set M the following assertions hold: 


(1) ØC M. 
(2) MTØ>M=Ø. 


Proof 


(1) We have to show that the assertion “Vx : x € Ø = x € M” is true. Since there 
is no x € Ø, the assertion “x € Ø” is false, and therefore “x € Ø > x € M” is 
true for every x (cp. the remarks on the implication A => B). 

(2) Let M C Ø. From (1) we know that Ø C M and hence M = Ø follows by (2) 
in Definition 2.2. o 


Theorem 2.5 Let M, N, L be sets. Then the following assertions hold for the subset 
relation “C”: 


(1) M C M (reflexivity). 
(2) FM C N and N C L, then M C L (transitivity). 


Proof 


(1) We have to show that the assertion “Vx : x € M => x €e M” is true. If “x € M” 
is true, then “x € M = x € M” is an implication with two true assertions, and 
hence it is true. 

(2) We have to show that the assertion “Vx : x e M > x e L” is true. If “x € M” 
is true, then also “x € N” is true, since M C N. The truth of “x € N” implies 
that “x € L” is true, since N C L. Hence the assertion “x € M => x e L” is 
true. O 
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Definition 2.6 Let M, N be sets. 


(1) The union? of M and Nis MUN :={x|xeEM v xen}. 
(2) The intersection of M and Nis MON :={x|xEM A xeEN}. 
(3) The difference of MandNisM\N:={x|xEM AxN}. 


If MON = Ø, then the sets M and N are called disjoint. The set operations union 
and intersection can be extended to more than two sets: If J Æ Ø is a set and if for 
alli € J there is a set M;, then 


U M; := {x | 3i € I with x € M;} and () M; := {x | Vi € I we have x € Mi}. 


icl icl 
The set Z is called an index set. For I = {1,2,...,n} C N we write the union and 
intersection of the sets M1, M2, ..., Mn as 


5 M; and N M;i. 
i=l i=] 


Theorem 2.7 Let M C N for two sets M, N. Then the following are equivalent: 


(1) MCN. 
(2) N\M £@. 


Proof We show that (1) => (2) and (2) => (1) hold. 


(1) => (2): Since M Æ N, there exists an x € N with x ¢ M. Thus x € N \ M, so 
that N \ M Æ Ø holds. 

(2) = (1): There exists an x € N with x M, and hence N Æ M. Since M C N 
holds, we see that M C N holds. o 


Theorem 2.8 Let M, N, L be sets. Then the following assertions hold: 


(1) MNNCMandMCMUN. 

(2) Commutativity: MN N=NM1MandMUN=NUM. 

(3) Associativity: MN(NOL)=(MON)OLandMU(NUL)=(MUN)UL. 

(4) Distributivity M U (NAO L) = (MUN)NA(M U L) and M A(N U L) = 
(MNN)U(MNL). 

(5) M\NCM. 

(6) M\ (NO L)=(M\N)U(M\L)andM\(NUL)=(M\N)N(M\L). 


Proof Exercise. o 


*The notations M U N and M N N for union and intersection of sets M and N were introduced 
in 1888 by Giuseppe Peano (1858—1932), one of the founders of formal logic. The notation of the 
“smallest common multiple IN(M, NY” and “largest common divisor D(M, N)” of the sets M and 
N suggested by Georg Cantor (1845—1918) did not catch on. 
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Definition 2.9 Let M be a set. 


(1) The cardinality of M, denoted by |M |, is the number of elements of M. 
(1) The power set of M, denoted by P(M), is the set of all subsets of M, 1.e., 
P(M) :={N|N C M}. 


The empty set Ø has cardinality zero and P(@) = {Ø}, thus |P(@)| = 1. For 
M = {1, 3} the cardinality is |M| = 2 and 
P(M) = {@, {1}, {3}, M}, 


and hence |P(M)| = 4 = 2!”!. One can show that for every set M with finitely many 
elements, i.e., finite cardinality, |P(M)| = 2!”! holds. 


2.2 Maps 


In this section we discuss maps between sets. 


Definition 2.10 Let X, Y be nonempty sets. 


(1) A map f from X to Y is a rule that assigns to each x € X exactly one y = 
f(x) € Y. We write this as 


f: X>Y, xr y= f(x). 


Instead of x œ> y = f(x) we also write f(x) = y. The sets X and Y are called 
domain and codomain of f. 

(2) Two maps f : X > Y andg : X — Y are called equal when f(x) = g(x) 
holds for all x € X. We then write f = g. 


In Definition 2.10 we have assumed that X and Y are nonempty, since otherwise 
there can be no rule that assigns an element of Y to each element of X. If one of 
these sets is empty, one can define an empty map. However, in the following we will 
always assume (but not always explicitly state) that the sets between which a given 
map acts are nonempty. 


Example 2.11 Two maps from X = R to Y = R are given by 


f:X>Y, f) =x, (2.1) 
0, x <0, 

gi: X oY, sei (2.2) 
l, x >Q. 


To analyze the properties of maps we need some further terminology. 
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Definition 2.12 Let X, Y be nonempty sets. 


(1) The map Idx : X —> X, x > x, is called the identity on X. 
(2) Let f : X — Y be a map and let M C X and N C Y. Then 


f(M):={f)lxE€M}CcC Y iscalled the image of M under f, 
F(N) := {x © X | f(x) € N} iscalled the pre-image of N under f. 


(3) f f: X — Y,x bh f(x) isamap and Ø Æ M C X, then fly: M > Y, 
x |> f(x), 1s called the restriction of f to M. 


One should note that in this definition f7! (N) is a set, and hence the symbol f~! 
here does not mean the inverse map of f. (This map will be introduced below in 
Definition 2.21.) 


Example 2.13 For the maps with domain X = R in (2.1) and (2.2) we have the 
following properties: 


f(X)={xeR|x>0}, fR- = {0}, f-'d-1) = Ø, 
g(X) = {0,1}, g` R-) =g "(OP =R-, 


where R- := {x € R| x < 0}. 


Definition 2.14 Let X, Y be nonempty sets. A map f : X — Y is called 


(1) injective, if for all x1, x2 € X the equality f (x1) = f (x2) implies that xı = x2, 
(2) surjective, if f(X) =Y, 
(3) bijective, if f is injective and surjective. 


For every nonempty set X the simplest example of a bijective map from X to X 
is Idy, the identity on X. 


Example 2.15 Let R4 := {x € R |x > O}, then 

f:R-R, f@®= x’, is neither injective nor surjective. 

f : R> R,, f(x) = x’, is surjective but not injective. 

f : Ri > R, f(x) = x’, is injective but not surjective. 

f: R > R4, f(x) = x’, is bijective. 

In these assertions we have used the continuity of the map f(x) = x? that is discussed 
in the basic courses on analysis. In particular, we have used the fact that continuous 
functions map real intervals to real intervals. The assertions also show why it is 
important to include the domain and codomain in the definition of a map. 


Theorem 2.16 A map f : X — Y is bijective if and only if for every y € Y there 
exists exactly one x € X with f(x) = y. 


Proof =: Let f be bijective and let yı € Y. Since f is surjective, there exists an 
xı E€ X with f (xı) = yı. If some x2 € X also satisfies f(x.) = y1, then x} = x2 
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follows from the injectivity of f. Therefore, there exists a unique x; € X with 
f(x) = y. 

<: Since for all y € Y there exists a unique x € X with f(x) = y, it follows that 
f(X) = Y. Thus, f surjective. Let now x1, x2 E€ X with f(x) = ftw) =yeY. 
Then the assumption implies x; = x2, so that f is also injective. o 


One can show that between two sets X and Y of finite cardinality there exists a 
bijective map if and only if |X| = |Y |. 


Lemma 2.17 For sets X, Y with |X| = |Y| = m e€ N, there exist exactly m! := 
1-2-...-m pairwise distinct bijective maps between X and Y. 
Proof Exercise. o 


Definition 2.18 Let f : X —> Y, x > f(x), and g : Y — Z, y œ> g(y) be maps. 
Then the composition of f and g is the map 


gof: X > Z, xb g(f(x)). 


The expression g o f should be read “g after f”, which stresses the order of the 
composition: First f is applied to x and then g to f(x). One immediately sees that 
f oldy = f = Idy o f for every map f : X —> Y. 


Theorem 2.19 Let f: W — X,g:X — Y,h:Y — Z be maps. Then 


(1) ho(gof)=(hog)o f, ie., the composition of maps is associative. 
(2) If f and g are injective/surjective/bijective, then g o f is injective/ 
surjective/bijective. 


Proof Exercise. o 


Theorem 2.20 A map f : X — Y is bijective if and only if there exists a map 
g: Y — X with 
go f =Idx and f og= Idy. 


Proof =: If f is bijective, then by Theorem 2.16 for every y € Y there exists an 
x = xy € X with f(xy) = y. We define the map g by 


g:Y > X, gO) =%y. 
Let y € Y be given, then 
(f ogy) = f(g) = fas) =y, hence fog = lidy. 


If, on the other hand, x € X is given, then y = f(x) € Y. By Theorem 2.16, there 
exists a unique x; € X with f (x5) = y such that x = x3. So with 


(go PE) = (go fos) = 9 f O35) = 9O) = x5 = X, 
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we have go f = Idx. 

<: By assumption go f = Idx, thus go f is injective and thus also f is injective 
(see Exercise 2.7). Moreover, f o g = Idy, thus f o g is surjective and hence also f 
is surjective (see Exercise 2.7). Therefore, f is bijective. o 


The map g : Y — X that was characterized in Theorem 2.20 is unique: If there 
were another map h : Y > X with h o f = Idx and f oh = Idy, then 


h = Idy oh = (go f)oh=go(foh)=goldy =g. 


This leads to the following definition. 


Definition 2.21 If f : X — Y is a bijective map, then the unique map g : Y —> X 
from Theorem 2.20 is called the inverse (or inverse map) of f. We denote the inverse 
of f by f!i. 

To show that a given map g : Y — X is the unique inverse of the bijective map 
f : X — Y, itis sufficient to show one of the equations go f = Idx or fog = Idy. 
Indeed, if f is bijective and g o f = Idx, then 


g=gely=golfoy J= GeJo] = lizo] "=J 


In the same way g = f~! follows from the assumption f o g = Idy. 


Theorem 2.22 Iff : X — Y andg : Y — Z are bijective maps, then the following 
assertions hold: 


(1) f7! is bijective with (f7)! = f. 
(2) go f is bijective with (go f)! = f7! o g™!. 


Proof 


(1) Exercise. 
(2) We know from Theorem 2.19 that go f : X — Z is bijective. Therefore, there 
exists a (unique) inverse of g o f. For the map f~! o g~! we have 


(fog jogo f) =f eg ogo f) =f o(o09)o0 f) 
= f~! o (Idy o f) = f™' o f = Idy. 


1 


Hence, f~! o g7! is the inverse of g o f. o 


2.3 Relations 
We first introduce the cartesian product? of two sets. 


3 Named after René Descartes (1596-1650), the founder of Analytic Geometry. Georg Cantor (1845- 
1918) used in 1895 the name “connection set of M and N” and the notation (M.N) = {(m,n)}. 
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Definition 2.23 If M, N are nonempty sets, then the set 
MxN:={(x,y)|xEeM a yeN} 

is the cartesian product of M and N. An element (x,y) € M x N is called an 


(ordered) pair. 


We can easily generalize this definition ton € N nonempty sets M1, ..., Mn: 
Mi Xas X Me SS (Ob 235) laa eM tor t= Lenk 


where an element (x1, ..., Xn) E€ Mı x--- x Mn is called an (ordered) n-tuple. The 
n-fold cartesian product of a single nonempty set M is 


M” :=Mx...x M ={(x1,..., Xn) |x, E M for i =1,...,n}. 
S i 
n times 
If in these definitions at least one of the sets is empty, then the resulting cartesian 


product is the empty set as well. 


Definition 2.24 IfM, N are nonempty sets then a set R C M x N is called a relation 
between M and N. If M = N, then R is called a relation on M. Instead of (x, y) € R 
we also write x ~r y orx ~ y, if it is clear which relation is considered. 


If in this definition at least one of the sets M and N is empty, then every relation 
between M and N is also the empty set, since then M x N = Ø. 
If, for instance M = N and N = Q, then 


R ={(x,y) EM xN |xy=l} 
is a relation between M and N that can be expressed as 
R = {(1, 1), @, 1/2), G, 1/3), ...} ={@, 1/n) |n € N}. 


Definition 2.25 A relation R on a set M is called 


(1) reflexive, if x ~ x holds for all x € M, 
(2) symmetric, if (x ~ y) = (y ~ x) holds for all x, y € M, 
(3) transitive, if (x ~ y A y~z) => (x ~z) holds for all x, y,z € M. 


If R is reflexive, transitive and symmetric, then it is called an equivalence relation 
on M. 


Example 2.26 
(1) Let R = {(x, y) € Q? | x = —y}. Then R is not reflexive, since x = —x holds 
only for x = 0. If x = —y, then also y = —x, and hence R is symmetric. 


Finally, R is not transitive. For example, (x, y) = (1, —1) € R and (y, z) = 
(—1,1) € R, but (x,z) = (1,1) £ R. 
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(2) The relation R = {(x, y) € Z? | x < y} is reflexive and transitive, but not 
symmetric. 

(3) If f : R > Risa map, then R = {(x, y) € R? | f(x) = f(y)} is an 
equivalence relation on R. 


Definition 2.27 let R be an equivalence relation on the set M. Then, for x € M the 
set 
[x]r := {y E M | œ, y) E R}={yEeM]|x~y} 


is called the equivalence class of x with respect to R. The set of equivalence classes 
M/R := į{lx]r|x € M} 
is called the quotient set of M with respect to R. 


The equivalence class [x]r of elements x € M is never the empty set, since always 
x ~ x (reflexivity) and therefore x € [x]pg. If it is clear which equivalence relation 
R is meant, we often write [x] instead oft [x]z and also skip the additional “with 
respect to R”. 


Theorem 2.28 Jf R is an equivalence relation on the set M and if x, y € M, then 
the following are equivalent: 


(1) [x] = Ly]. 
(2) [x]N[y] 4 Ø. 
(3) x~y. 


Proof 


(1) = (2): Since x ~ x, it follows that x € [x]. From [x] = [y] it follows that 
x € [y] and thus x € [x] A [y]. 

(2) > (3) : Since [x] A [y] Æ Ø, there exists a z € [x] A [y]. For this element z we 
have x ~ z and y ~ z, and thus x ~ z and z ~ y (symmetry) and, therefore, 
x ~ y (transitivity). 

(3) => (1): Letx ~ yandz € [x], 1.e., x ~ z. Using symmetry and transitivity, we 
obtain y ~ z, and hence z € [y]. This means that [x] € [y]. In an analogous 
way one shows that [y] € [x], and hence [x] = [y] holds. o 


Theorem 2.28 shows that for two equivalence classes [x] and [y] we have either 
[x] = [y] or [x]N[y] = Ø. Thus every x € M is contained in exactly one equivalence 
class (namely in [x]), so that an equivalence relation R yields a partitioning or 
decomposition of M into mutually disjoint subsets. Every element of [x] is called a 
representative of the equivalence class [x]. A very useful and general approach that 
we will often use in this book is to partition a set of objects (e.g. sets of matrices) into 
equivalence classes, and to find in each such class a representative with a particularly 
simple structure. Such a representative is called a normal form with respect to the 
given equivalence relation. 
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Example 2.29 For a given number n € N the set 
R, := {(a, b) € Z? |a — b is divisible by n without remainder} 


is an equivalence relation on Z, since the following properties hold: 


e Reflexivity: a — a = 0 is divisible by n without remainder. 

e Symmetry: If a — b is divisible by n without remainder, then also b — a. 

e Transitivity: Let a — b and b — c be divisible by n without remainder and write 
a—c = (a — b) + (b — c). Both summands on the right are divisible by n without 
remainder and hence this also holds for a — c. 


For a € Z the equivalence class [a] is called residue class of a modulo n, and 
[a] =a+nZ:= {a +nz |z € Z}. The equivalence relation R„ yields a partitioning 
of Z into n mutually disjoint subsets. In particular, we have 


n—l1 
[0]U []U---U[n— 1] = Jia] =], 
a=0 


The set of all residue classes modulo n, 1.e., the quotient set with respect to Rn, is 
often denoted by Z/nZ. Thus, Z/nZ := {[0], [1],..., [n — 1]}. This set plays an 
important role in the mathematical field of Number Theory. 


Exercises 
2.1 Let A, B, C be assertions. Show that the following assertions are true: 
(a) For A and v the associative laws 


LAA B)AC] & [AA(BAC)], [AV B)VC] & [AV (BVO)] 


hold. 
(b) For A and v the commutative laws 


(AAB) & (BAA), (AV B) & (BVA) 


hold. 
(c) For A and v the distributive laws 


(AA B)vC] & [(AVOC)A(BYV C)], [AV B)AC] & [(AAQC)V(BAC)] 
hold. 


2.2 Let A, B, C be assertions. Show that the following assertions are true: 


(a) AABAA. 
(b) [AS B] S&S [(A > B)A(BSA)I. 
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2.3 
2.4 


2D 


2.6 


Zt 


2.8 


29 
2.10 
2 lal 
2:12 


2.13 
2.14 


(c) =(A v B) & [(AA) A (7 B)]. 
(d) =(A A B) © [(CA) V (7 B)I. 
(e) (A > B)A (B > C) > [A > C]. 
(ff) [A > (B vC) e [(A^>B) > C]. 


(The assertions (c) and (d) are called the De Morgan laws for ^ and vV.) 
Prove Theorem 2.8. 
Show that for two sets M, N the following holds: 


NCM ẹe MNN=N e MUN=M. 


Let X, Y be nonempty sets, U, V C Y nonempty subsets and let f : X —> Y 
be a map. Show that f- '(UNV) = f!(U) A f7!(V). Let U,V C X be 
nonempty. Check whether f(U UV) = f(U) U f(V) holds. 

Are the following maps injective, surjective, bijective? 


(a) fi: R\ {0} > R, xb Et. 
(b) fo: R* >R, (x,y) he x +y. 
(c) fh: R? > R, (x, y) x? +y — 1. 


(d) fa: N > Z, nh 


Show that for two maps f : X — Y and g : Y — Z the following assertions 
hold: 


(a) go f is surjective => g is surjective. 
(b) go f is injective > f is injective. 


Let a € Z be given. Show that the map fa : Z > Z, falx) = x +a is 
bijective. 

Prove Lemma 2.17. 

Prove Theorem 2.19. 

Prove Theorem 2.22 (1). 

Find two maps f, g : N — N, so that simultaneously 


(a) f is not surjective, 
(b) g is not injective, and 
(c) go f is bijective. 


Determine all equivalence relations on the set {1, 2}. 
Determine a symmetric and transitive relation on the set {a, b, c} that is not 
reflexive. 


Chapter 3 
Algebraic Structures 


An algebraic structure is a set with operations between its elements that follow certain 
rules. As an example of such a structure consider the integers and the operation ‘+.’ 
What are the properties of this addition? Already in elementary school one learns 
that the sum a + b of two integers a and b is another integer. Moreover, there is 
a number 0 such that 0 + a = a for every integer a, and for every integer a there 
exists an integer —a such that (—a) +a = 0. The analysis of the properties of such 
concrete examples leads to definitions of abstract concepts that are built on a few 
simple axioms. For the integers and the operation addition, this leads to the algebraic 
structure of a group. 

This principle of abstraction from concrete examples is one of the strengths and 
basic working principles of Mathematics. By “extracting and completely expos- 
ing the mathematical kernel’ (David Hilbert) we also simplify our further work: 
Every proved assertion about an abstract concept automatically holds for all con- 
crete examples. Moreover, by combining defined concepts we can move to further 
generalizations and in this way extend the mathematical theory step by step. Her- 
mann Günther GraBmann (1809-1877) described this procedure as follows!: “... the 
mathematical method moves forward from the simplest concepts to combinations of 
them and gains via such combinations new and more general concepts.” 


3.1 Groups 


We begin with a set and an operation with specific properties. 


Definition 3.1 A group is a set G with a map, called operation, 


D: GxG— G, (a,b) a Q®b, 


I die mathematische Methode hingegen schreitet von den einfachsten Begriffen zu den zusam- 
mengesetzteren fort, and gewinnt so durch Verknüpfung des Besonderen neue and allgemeinere 
Begriffe.” 
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that satisfies the following: 


(1) The operation © is associative, 1.e., (a Ð b) Ð c = a @ (b @ c) holds for all 
a,b,c eG. 
(2) There exists an element e € G, called a neutral element, for which 


(a) e®a =a foralla € G, and 
(b) for every a € G there exists ana € G, called an inverse element of a, with 
ama=e. 


Ifa b = b @a holds for all a,b € G, then the group is called commutative or 
Abelian.’ 


As short hand notation for a group we use (G, ®) or just G, if is clear which 
operation is used. 


Theorem 3.2 For every group (G, ®) the following assertions hold: 


(1) Ife € Gis aneutral element and ifa, a € G witha Da = e, thenalsoa@a = e. 
(2) Ife € Gis a neutral element and if a € G, then alsoa Qe = a. 

(3) G contains exactly one neutral element. 

(4) For every a € G there exists a unique inverse element. 


Proof 


(1) Lete € G be a neutral element and let a,a € G satisfy a ® a = e. Then by 
Definition 3.1 there exists an element a; € G witha, ®a = e. Thus, 


agma=eO@(aQ@a) = (a, Pa) laaa OG (a4 Ga) Pa) 
=a, PB (ePa) =a, PA =e. 


(2) Lete € G be a neutral element and let a € G. Then there exists a € G with 
a a =e. By (1) then also a @a = e and it follows that 


ame=a@(a@a)=(a@a)Pa=eOa=a. 
(3) Let e,e; € G be two neutral elements. Then e; ® e = e, since e; is a neutral 
element. Since e is also a neutral element, it follows that e} = e Q e; = e; Pe, 


where for the second identity we have used assertion (2). Hence, e = e1. 


(4) Let a, a; € G be two inverse elements of a € G and let e € G be the (unique) 
neutral element. Then with (1) and (2) it follows that 


a=e@a= (aj a) Qa =a (aĝa) =a, pe =a. o 


? Named after Niels Henrik Abel (1802-1829), the founder of group theory. 
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Example 3.3 


(1) (Z, +), (Q, +) and (R, +) are commutative groups. In all these groups the neu- 
tral element is the number 0 (zero) and the inverse of a is the number —a. Instead 
of a + (—b) we usually write a — b. Since the operation is the addition, these 
groups are also called additive groups. 


The natural numbers N with the addition do not form a group, since there is no 
neutral element in N. If we consider the set No, which includes also the number 
0 (zero), then 0 +a =a +0 =a for alla € No, but only a = 0 has an inverse 
element in N. Hence also Ng with the addition does not form a group. 

(2) The sets Q \ {0} and R \ {0} with the usual multiplication form commutative 
groups. In these multiplicative groups, the neutral element is the number 1 (one) 
and the inverse element of a is the number 1 (or a7 t). Instead of a- b7! we also 
write 5 or a/b. 

The integers Z with the multiplication do not form a group. The set Z includes 
the number 1, for which 1 -a =a -1 =a for alla € Z, but noa € Z\ {-1, 1} 
has an inverse element in Z. 


Definition 3.4 Let (G, ®) be a group and H C G. If (H, ®) is a group, then it is 
called a subgroup of (G, ®). 


The next theorem gives an alternative characterization of a subgroup. 


Theorem 3.5 (H, ®) is a subgroup of the group (G, ®) if and only if the following 
properties hold: 


1) OAHCG. 
(2) a®b eH foralla,b € H. 
(3) For everya € H also the inverse element satisfies a € H. 


Proof Exercise. o 


The following definition characterizes maps between two groups which are com- 
patible with the respective group operations. 


Definition 3.6 Let (G1, ®) and (G2, ®) be groups. A map 
Oe GiGi, -8> y) 
is called a group homomorphism, if 
ylab) = yla) y(b) foralla,b € Gi. 


A bijective group homomorphism is called a group isomorphism. 
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3.2 Rings and Fields 


In this section we extend the concept of a group and discuss mathematical structures 
that are characterized by two operations. As motivating example consider the integers 
with the addition, i.e., the group (Z, +). We can multiply the elements of Z and this 
multiplication is associative, 1.e., (a-b)-c = a-(b-c) foralla, b,c € Z. Furthermore 
the addition and multiplication satisfy the distributive laws a-(b+c) =a-b+a-c 
and (a+ b) -c =a -c +b -c for all integers a, b, c. These properties make Z with 
addition and multiplication into a ring. 


Definition 3.7 A ring is a set R with two operations 


+:RxR—-R, (a,b) |> a +b, (addition) 
x : R x R —> R, (a,b) => a *b, (multiplication) 


that satisfy the following: 


(1) (R, +) is a commutative group. 
We call the neutral element in this group zero, and write 0. We denote the inverse 
element ofa € R by —a, and write a — b instead of a + (—b). 
(2) The multiplication is associative, 1.e., (a*b)*c = a x (b xc) foralla, b,c E€ R. 
(3) The distributive laws hold, i.e., for all a, b,c € R we have 


ax(b+c)=axb+axc, 
(a+b)xc=axc+bxc. 


A ring is called commutative if a x b = b xa for alla, b € R. 
An element | € R is called unit if 1 xa =a x 1 =a forall a € R. In this case R is 
called a ring with unit. 


On the right hand side of the two distributive laws we have omitted the parentheses, 
since multiplication is supposed to bind stronger than addition, 1.e.,a + (b * c) = 
a + b x c. If it is useful for illustration purposes we nevertheless use parentheses, 
e.g., we sometimes write (a x b) + (c x d) instead ofa x b+c xd. 

Analogous to the notation for groups we denote a ring with (R, +, *) or just with 
R, if the operations are clear from the context. 

In a ring with unit, the unit element is unique: If 1, e € Rsatisfylxa=axl=a 
and e xa =a xe =a for alla € R, then in particular e = e x 1 = 1. 

Fora, a2,..., a E R we use the following abbreviations for the sum and product 
of these elements: 


n 


n 
> aj =a +a +... +a and | | aj := a1 * az... an. 
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Moreover, a” := ID- a foralla € R andn € N. If £ > k, then we define the 
empty sum as 


Theorem 3.8 For every ring R the following assertions hold: 


(1) Oxa =a x0 =Q foralla E€ R. 
(2) ax (—b) = —(a x b) = (—a) x b and (—a) x (—b) =a xb foralla,b € R. 


Proof 


(1) For every a € R we have O xa = (0+ 0) xa = (0 xa)+ (0 xa). Adding 
—(0 xa) on the left and right hand sides of this equality we obtain 0 = 0 xa. In 
the same way we can show that a x 0 = 0 foralla E R. 


(2) Since (ax b)+ (ax (—b)) =ax(b+(—b)) = ax0 = 0, it follows that a x (—b) 
is the (unique) additive inverse of a x b, 1.e., a x (—b) = —(a x b). In the same 
way we can show that (—a) x b = —(a x b). Furthermore, we have 


0 = 0 x (—b) = (a + (—a)) * (—b) = a * (—b) + (—a) * (—b) 
= —(a * b) + (—a) * (—b), 


and thus (—a) x (—b) =a * b. Oo 


It is immediately clear that (Z, +, x) is a commutative ring with unit. This is the 
standard example, by which the concept of a ring was modeled. 


Example 3.9 Let M be a nonempty set and let R be the set of maps f : M — R. 
Then (R, +, *) with the operations 


+: Rx R—>R, Cie) eS ees (f + g)(x) := f(x) + g(x), 
x: Rx RR, (f,g)t> f *g, (f * g)(x) = f(x): g(x), 


is a commutative ring with unit. Here f(x) + g(x) and f(x) - g(x) are the sum and 
product of two real numbers. The zero in this ring is the map Or : M > R, x b O, 
and the unit is the map lr : M — R, x + 1, where 0 and 1 are the real numbers 
zero and one. 


In the definition of a ring only additive inverse elements occur. We will now 
formally define the concept of a multiplicative inverse. 
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Definition 3.10 Let (R, +, x) be a ring with unit. An element b € R is called an 
inverse of a € R (with respect to «), if a x b = b xa = 1. An element of R that has 
an inverse is called invertible. 


It is clear from the definition that b € R is an inverse of a € R if and only if 
a € R is an inverse of b € R. In general, however, not every element in a ring must 
be (or is) invertible. But if an element is invertible, then it has a unique inverse, as 
shown in the following theorem. 


Theorem 3.11 Let (R, +, x) be a ring with unit. 


(1) Ifa € R is invertible, then the inverse is unique and we denote it by a™!. 


(2) Ifa,b € Rare invertible thena xb € R is invertible and (a*xb)~! = b`! xa“. 

Proof 

(1) Ifb,be R are inverses of a € R, thenb=bxl=bx(a x D) = (b xa) xb = 
l*bp= b: 

(2) Since a and b are invertible, b7! x a7! € R is well defined and 


(b7! xa7!)x(a * b) = ((b7! xaT hxa)xb = (b7! x (a7! *a))*b = bo! xb = 1. 


In the same way we can show that (a x b) x (b7! x a7!) = 1, and thus 
(a x b)! = b7! x att. o 


From an algebraic point of view the difference between the integers on the one 
hand, and the rational or real numbers on the other, is that in the sets Q and R every 
element (except for the number zero) is invertible. This “additional structure” makes 
Q and R into fields. 


Definition 3.12 A commutative ring R with unit is called a field, if 0 Æ 1 and every 
a € R \ {0} is invertible. 


By definition, every field is a commutative ring with unit, but the converse does 
not hold. One can also introduce the concept of a field based on the concept of a 
group (cp. Exercise 3.15). 


Definition 3.13 A field is a set K with two operations 


+:KxK— K, (a, b) —> a +b, (addition) 
x: K xK—>K, (a,b) => a *b, (multiplication) 
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that satisfy the following: 


(1) (K, +) is acommutative group. 
We call the neutral element in this group zero, and write 0. We denote the inverse 
element of a € K by —a, and write a — b instead of a + (—b). 

(2) (K \ {0}, *) is a commutative group. 
We call the neutral element in this group unit, and write 1. We denote the inverse 
element of a € K \ {0} by a™!. 

(3) The distributive laws hold, i.e., for all a, b,c € K we have 


ax(b+c)=axb+axc, 
(a+b)xc=axctbxc. 


We now show a few useful properties of fields. 


Lemma 3.14 For every field K the following assertions hold: 


(1) K has at least two elements. 

(2) Oxa=ax0=Oforallae K. 

(3) axb=axcanda £ 0 imply that b = c foralla,b,c E€ K. 
(4) axb=Oimply thata = Qorb = 0 foralla,b € K. 


Proof 


(1) This follows from the definition, since 0, 1 € K with 0 Æ 1. 

(2) This has already been shown for rings (cp. Theorem 3.8). 

(3) Since a Æ 0, we know that a7! exists. Multiplying both sides of a x b = a * c 
from the left with a~! yields b = c. 

(4) Suppose that a * b = 0. If a = 0, then we are finished. If a Æ 0, then a~! exists 
and multiplying both sides of a x b = 0 from the left with a~! yields b = 0. oO 


For a ring R an element a € R is called a zero divisor, if a b € R \ {0} exists 
with a x b = 0. The element a = 0 is called the trivial zero divisor. Property (4) in 
Lemma 3.14 means that fields contain only the trivial zero divisor. There are also 
rings in which property (4) holds, for instance the ring of integers Z. In later chapters 
we will encounter rings of matrices that contain non-trivial zero divisors (see e.g. the 
proof of Theorem 4.9 below). 

The following definition is analogous to the concepts of a subgroup (cp. Defini- 
tion 3.4) and a subring (cp. Excercise 3.14). 


Definition 3.15 Let (K, +, x) be a field and L C K. If (L, +, *) is a field, then it 
is called a subfield of (K, +, *). 


As two very important examples for algebraic concepts discussed above we now 


discuss the field of complex numbers and the ring of polynomials. 


3The concept of zero divisors was introduced in 1883 by Karl Theodor Wilhelm Weierstraß (1815— 
1897). 
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Example 3.16 The set of complex numbers is defined as 

Ci=10, 9) |4, 7 eR aR xk. 
On this set we define the following operations as addition and multiplication: 


oC Ce, One Oey) = Ui. FD), 
~-2CxC>C, rI) a, ya) = O12 Ji yo, X1 + Y2 + X2- V1). 
On the right hand sides we here use the addition and the multiplication in the field 


IR. Then (C, +, -) is a field with the neutral elements with respect to addition and 
multiplication given by 


Oc = (0, 0), 
lc = (d, 9), 
and the inverse elements with respect to addition and multiplication given by 
—(x, y) = (=x, =y) forall x,y) eC, 
(x,y! = (—~ _,-_* _) forall (x,y) € C\ {0,0} 
: x2 a, y2 x2 +. y2 i 7 i 
In the multiplicative inverse element we have written $ instead of a - b—!, which is 
the common notation in R. 

Considering the subset L := {(x, 0) | x € R} C C, we can identify every x € R 
with an element of the set L via the (bijective) map x b (x, 0). In particular, 
Or |> (0, 0) = Oc and 1g (1, 0) = Ic. Thus, we can interpret R as subfield of C 
(although R is not really a subset of C), and we do not have to distinguish between 


the zero and unit elements in R and C. 
A special complex number is the imaginary unit (0, 1), which satisfies 


(0,1). (0,1)=(0.-0—1.1,0.1+0.1)=(-—1,0) = 1. 


Here again we have identified the real number — 1 with the complex number (—1, 0). 
The imaginary unit is denoted by i, i.e., 


i := (0, 1), 


and hence we can write i? = —1. Using the identification of x € R with (x, 0) € C 
we can write z = (x, y) € C as 


(x, y) = (x, 0) + (0, y) = (x, 0) + (0, 1) - (y, 0) = x + iy = Re(z) + i Im(z). 
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In the last expression Re(z) = x and Im(z) = y are the abbreviations for real 
part and imaginary part of the complex number z = (x, y). Since (0, 1) - (y, 0) = 
(y, 0) - (0, 1), i.e., ly = yi, it is justified to write the complex number x + iy as 
x+ yl. 

For a given complex number z = (x, y) or z = x + İy the number Z := (x, —y), 
respectively z := x — iy, is called the associated complex conjugate number. Using 
the (real) square root, the modulus or absolute value of a complex number is defined 
as 


1/2 _ 


Meee. = ((x +iy)(x — iy)) 


; : 1/2 
= (x? — ixy +iyx — i?y?) /2 _ (x? 4 y2)!/2, 
(Again, for simplification we have omitted the multiplication sign between two com- 
plex numbers.) This equation shows that the absolute value of a complex number is 
a nonnegative real number. Further properties of complex numbers are stated in the 
exercises at the end of this chapter. 


Example 3.17 Let (R, +, -) be a commutative ring with unit. A polynomial over R 
and in the indeterminate or variable ft is an expression of the form 


p=ag-t?+a,-t'+...4+a,-f", 


where ao, Q1, ..., Qn E€ R are the coefficients of the polynomial. Instead of ao - t°, 
t! and a; - t? we often just write ao, t and a;t/. The set of all polynomials over R 
is denoted by R[f]. 

Let 


p = œ tatt... +n: t, q = Bot A-tt...+Bm-t” 
be two polynomials in R[t] with n > m. If n > m, then we set 8; = 0 for j = 


m+ 1,...,n and call p and q equal, written p = q, if œa; = 8; for j = 0, 1,...,n. 
In particular, we have 


ao +a- t+... + Qnr: t” =ar: t" +... +01: t +a, 
0+0-tf+...+0-t" =0. 


The degree of the polynomial p = ag +Q,-t+...+a,-t”, denoted by deg(p), 
is defined as the largest index j, for which a; Æ 0. If no such index exists, then the 


polynomial is the zero polynomial p = 0 and we set deg(p) := —oo. 
Let p,q € R[t] as above have degrees n, m, respectively, with n > m.Ifn > m, 
then we again set 8; = 0, j =m+1,...,n. We define the following operations on 


R{t}: 
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Pra = (eo Go) Or Oi) thee On On) T; 


pg i= ot iet Festan t "s KI > Arj: 
i+j=k 


With these operations (R[t], +, *) is acommutative ring with unit. The zero is given 
by the polynomial p = 0 and the unit is p = 1 - £? = 1. But R[F] it is not a field, 
since not every polynomial p € R[t] \ {0} is invertible, not even if R is a field. For 
example, for p = t and any other polynomial q = Bo + Git + ... + Bmt” € Rt] 
we have 

p xq = Pot + Bit? +... + Bat”! £1, 


and hence p is not invertible. 

In a polynomial we can “substitute” the variable t by some other object when the 
resulting expression can be evaluated algebraically. For example, we may substitute 
t by any A € R and interpret the addition and multiplication as the corresponding 
operations in the ring R. This defines a map from R to R by 


AK pA) =a A tai Al Haan A Mrs A-...- A, kK =0,1,...,0, 
— a 
k times 


where \° = 1 € R (this is an empty product). Here one should not confuse the ring 
element p(A) with the polynomial p itself, but rather think of p(A) as an evaluation 
of p at A. We will study the properties of polynomials in more detail later on, and we 
will also evaluate polynomials at other objects such as matrices or endomorphisms. 


Exercises 
3.1 Determine for the following (M, ®) whether they form a group: 


(a) M = {xe R|x>O}and®:MxM —> M, (a,b) |> a’. 
(b) M =R\{0}and 9: M x M > M, (a,b) => $ 


b’ 
3.2 Leta, b € R, the map 
fab: RxR—-RXR, (4%, y) bP (ax — Dy, ay), 


and the set G = {fap | a,b € R, a Æ O} be given. Show that (G, o) is a 
commutative group, when the operation o : G x G — G is defined as the 
composition of two maps (cp. Definition 2.18). 

3.3 Let X Æ Ø be a set and let S(X) = {f : X > X | f is bijective}. Show that 
(S(X), 0) is a group. 

3.4 Let (G, ®) be a group. For a € G denote by —a e G the (unique) inverse 
element. Show the following rules for elements of G: 


(a) —(—a) =a. 

(b) —(a @ b) = (—b) © (~a). 
(c) a@b = a @ b > bı = bz. 
(d) a, Db =a, Qb >a =. 
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3.5 
3.6 


a7 


3.8 
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3.10 


3.11 
al 


3.13 


Prove Theorem 3.5. 

Let (G, ®) be a group and for a fixed a € G let Zg(a) = {g € Gla®g = 
g ® a}. Show that Zg(a) is a subgroup of G. 

(This subgroup of all elements of G that commute with a is called centralizer 
of a.) 

Let y : G — H bea group homomorphism. Show the following assertions: 


(a) If U C Gis a subgroup, then also p(U) C H is a subgroup. If, further- 
more, G is commutative, then also y(U) is commutative (even if H is not 
commutative). 

(b) If V C H is a subgroup, then also y~!(V) C G is a subgroup. 


Let y : G —> H bea group homomorphism and let eg and ey be the neutral 
elements of the groups G and H, respectively. 


(a) Show that y(eg) = ex. 
(b) Let ker(y) := {g € G| v(g) = ex}. Show that y is injective if and only 
if ker(~) = {ec}. 


Show the properties in Definition 3.7 for (R, +, x) from Example 3.9 in order 
to show that (R, +, x) is acommutative ring with unit. Suppose that in Example 
3.9 we replace the codomain IR of the maps by a commutative ring with unit. 
Is (R, +, *) then still a commutative ring with unit? 
Let R be aring and n € N. Show the following assertions: 
(a) For alla € R we have (—a)" = =e an E ii 
—a", ifnis odd. 

(b) If there exists a unit in R and if a” = O fora e R, then 1 — a is invertible. 

(An element a € R with a” = 0 for some n € N is called nilpotent.) 


Let R be a ring with unit. Show that 1 = 0 if and only if R = {0}. 
Let (R, +, x) be a ring with unit and let R* denote the set of all invertible 
elements of R. 


(a) Show that (R*, x) is a group (called the group of units of R). 
(b) Determine the sets Z*, K*, and K[t]*, when K is a field. 


For fixed n € N let nZ = {nk |k € Z} and Z/nZ = {[0], [1], ..., [n — 1]} be 
as in Example 2.29. 


(a) Show that nZ is a subgroup of Z. 
(b) Define by 


Ð: Z/nZ x Z/nZ —> Z/nZ, (la], [b]) > [a] 6 [b] = [a + b], 
O: Z/nZ x Z/nZ > Z/nZ, (la], [b]) > [a] © [b] = [a- b], 


an addition and multiplication in Z/nZ, (with + and - being the addition 
and multiplication in Z). Show the following assertions: 
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3.14 


3.15 


3.16 


3:7 
3.18 


3.19 
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(i) @ and © are well defined. 
(i) (Z/nZ, ®, ©) is a commutative ring with unit. 
Gu) (Z/nZ, ®, ©) is a field if and only if n is a prime number. 


Let (R, +, x) be a ring. A subset S C R is called a subring of R, if (S, +, *) 
is a ring. Show that S$ is a subring of R if and only if the following properties 
hold: 


(1) SCR. 

(2) Or ES. 

(3) Forallr,s € Salsor+seSandrxs eS. 
(4) Forallr e S also =r € S. 


Show that the Definitions 3.12 and 3.13 of a field describe the same mathemat- 
ical structure. 

Let (K, +, x) be a field. Show that (L, +, x) is a subfield of (K, +, *) (cp. 
Definition 3.15), if and only if the following properties hold: 


(1) LCK. 

(2) Og, Ik E L. 

(3) a+beLandaxbeLforalla,be L. 
(4) —a E Lforalla € L. 

(5) a7! e L foralla € L \ {0}. 


Show that in a field 1 + 1 = 0 holds if and only if 1 +1+1+1=0. 
Let (R, +, x) be acommutative ring with 1 Æ 0 that does not contain non-trivial 
zero divisors. (Such a ring is called an integral domain.) 


(a) Define on M = R x R \ {0} a relation by 
Gay) Oy) O Bey E. 


Show that this is an equivalence relation. 
(b) Denote the equivalence class [(x, y)] by - Show that the following maps 
are well defined: 


xX. KX xxpty*x 
D : (M/~) x (M/~) > (M/~) with (62-97 
Y J YEY 
< BK KEK 
©: (M/~) x (M/~) > (M/~w) with —© =:= ——=, 
y yey 


where M/ ~ denotes the quotient set with respect to ~ (cp. Definition 2.27). 
(c) Show that (M/ ~, @, ©) is a field. (This field is called the quotient field 
associated with R.) 
(d) Which field is (M/ ~, ®@, ©) for R = Z? 


In Exercise 3.18 consider R = K [tf], the ring of polynomials over the field K, 
and construct in this way the field of rational functions. 
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3.20 Leta = 2 +i e Cand b = 1 — 3i e C. Determine —a, —b, a + b,a — b, 
a~', b~!, a~a, b~'b, ab, ba. 

3.21 Show the following rules for the complex numbers: 
(a) Zi 22 = Zi + Z2 and z172 = Z1 z2 for all z1, z2 € C. 
(b) z7! = (Z)! and Re(z7!) = z Re(z) for all z € C \ {0}. 

3.22 Show that the absolute value of complex numbers satisfies the following prop- 


erties: 


(a) |z1Z2| = |z1| |Z2| for all z1, z2 € C. 
(b) |z| > 0 for all z € C with equality if and only if z = 0. 
(c) |Z1 +z2| < |zı| + |Z2| for all z1, z2 € C. 


Chapter 4 
Matrices 


In this chapter we define matrices with their most important operations and we study 
several groups and rings of matrices. James Joseph Sylvester (1814—1897) coined the 
term matrix! in 1850 and described matrices as “an oblong arrangement of terms”. 
The matrix operations defined in this chapter were introduced by Arthur Cayley 
(1821-1895) in 1858. His article “A memoir on the theory of matrices” was the first 
to consider matrices as independent algebraic objects. In our book matrices form the 
central approach to the theory of Linear Algebra. 


4.1 Basic Definitions and Operations 


We begin with a formal definition of matrices. 


Definition 4.1 Let R be a commutative ring with unit and let n,m € No. An array 


of the form 
dil 412 °** Aim 
421 422 `’ Am 
A = [aj] = 
n1 An2 °** Anm 


'The Latin word “matrix” means “womb”. Sylvester considered matrices as objects “out of which 
we may form various systems of determinants” (cp. Chap. 5). Interestingly, the English writer 
Charles Lutwidge Dodgson (1832-1898), better known by his pen name Lewis Carroll, objected to 
Sylvester’s term and wrote in 1867: “I am aware that the word ‘Matrix’ is already in use to express 
the very meaning for which I use the word ‘Block’; but surely the former word means rather the 
mould, or form, into which algebraic quantities may be introduced, than an actual assemblage of 
such quantities”. Dodgson also objected to the notation a;; for the matrix entries: “...most of the 
space is occupied by a number of a’s, which are wholly superfluous, while the only important part 
of the notation is reduced to minute subscripts, alike difficult to the writer and the reader.” 


© Springer International Publishing Switzerland 2015 37 
J. Liesen and V. Mehrmann, Linear Algebra, Springer Undergraduate 
Mathematics Series, DOI 10.1007/978-3-3 19-24346-7_4 


38 4 Matrices 


with ajj E€ R,i =1,...,n, j =1,...,m, is called a matrix of size n x m over R. 
The a;; are called the entries or coefficients of the matrix. The set of all such matrices 
is denoted by R””. 


In the following we usually assume (without explicitly mentioning it) that 1 4 0 
in R. This excludes the trivial case of the ring that contains only the zero element 
(cp. Exercise 3.11). 

Formally, in Definition 4.1 for n = 0 or m = O we obtain “empty matrices” of the 
size 0 x m,n x 0 or O x 0. We denote such matrices by [ ]. They will be used for 
technical reasons in some of the proofs below. When we analyze algebraic properties 
of matrices, however, we always consider n,m > 1. 

The zero matrix in R™™, denoted by O, m or just 0, is the matrix that has all its 
entries equal to0 € R. 

A matrix of size n x n is called a square matrix or just square. The entries a;; for 
i =1,...,n are called the diagonal entries of A. The identity matrix in R™” is the 
matrix I„ := [0;;], where 

pee (4.1) 
0 at 2 7. 
is the Kronecker delta-function.’ If it is clear which n is considered, then we just 
write J instead of J,,. For n = 0 we set Jp := [ ]. 

The ith row of A € R”” is [ai], di2, ..., aim] E€ R!”, i = 1,...,n, where we 

use commas for the optical separation of the entries. The jth column of A is 


dij 


do; 
al n,1 e 
ER; J= leer 


Thus, the rows and columns of a matrix are again matrices. 
If 1 x m matrices a; := [dj1, 4j2,..., Aim] € RI”, i = 1,...,n, are given, then 
we can combine them to the matrix 


aj di1 412 °°: Aim 

a2 a21 422 `° Adm je 
A= — E€ R 

dn dni n2 *** Anm 


*Leopold Kronecker (1823-1891). 
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We then do not write square brackets around the rows of A. In the same way we can 
combine the n x 1 matrices 


dij 
a2j n,1 
dj m . ER” 9 j=l, sm, 
to the matrix 
di1 412 *** Aim 
A21 422 *** Adm an 
A= OO tee — a E Rv. 


Ani An2 °** Anm 


If nj,n2,m,,m2 E€ No and Aj; € RR”), i, j = 1,2, then we can combine these 
four matrices to the matrix 


Ai A12 nytno,mitm 
A= e Rmtrmtm 
Pi A22 


The matrices A;; are then called blocks of the block matrix A. 
We now introduce four operations for matrices and begin with the addition: 


+: R” x RO" > RS. (A, B) b> A+ B= [aij Fayh 


The addition in R”’” operates entrywise, based on the addition in R. Note that the 
addition is only defined for matrices of equal size. 
The multiplication of two matrices is defined as follows: 


ki R” x RYO’ Ke, (A, B) b> Ax B= [en Cij = XŠ Gin d Kj. 
k=1 


Thus, the entry c;; of the product A * B is constructed by successive multiplication 
and summing up the entries in the ith row of A and the jth column of B. Clearly, in 
order to define the product A x B, the number of columns of A must be equal to the 
number of rows in B. 

In the definition of the entries c;; of the matrix A x B we have not written the 
multiplication symbol for the elements in R. This follows the usual convention of 
omitting the multiplication sign when it is clear which multiplication is considered. 
Eventually we will also omit the multiplication sign between matrices. 
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We can illustrate the multiplication rule “c;; equals ith row of A times jth column 
of B” as follows: 


by ssa (Ou | sss Pis 
Din Dinj Dins 
dil °° Aim 
[ aii +++ Aim | -e 
n1 *** Anm 


It is important to note that the matrix multiplication in general is not commutative. 
Example 4.2 For the matrices 


eee 2,3 _ 3,2 
A=|456| <2 r oo 0 OjJEZ 


we have 


On the other hand, B x A € Z>°. Although A x B and B x A are both defined, we 
obviously have A x B Æ B x A. In this case one recognizes the non-commutativity 
of the matrix multiplication from the fact that A « B and B x A have different sizes. 
But even if A x B and B x A are both defined and have the same size, in general 
Ax B Æ B x A. For example, 


-112 2,2 _ 140 2,2 
a=|q3/€2 , B=|56|€2 


yield the two products 


14 12 48 
AxB=( 15 | and Bsa= Sag 


The matrix multiplication is, however, associative and distributive with respect to 
the matrix addition. 
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Lemma 4.3 For A, A € R™”, B, B € R"! and C e R** the following assertions 
hold: 


(1) A x (B xC) = (A x B) *C. 
(2) (A + A) * B = Ax B+AxB. 
(3) Ax(B+B)= AxB+AxB. 
(4) IL Lx A =A x Ín =A. 


Proof We only show property (1); the others are exercises. Let A € R”, B € Ro 
C € Ras well as (A * B) * C = [dij] and A * (B * C) = [d;;]. By the definition 
of the matrix multiplication and using the associative and distributive law in R, we 
get 


m 


£ m £ m £ 
2 G ehao ne aS 


s=] t=l s=] t= 


dit (DisCsj) 
1 


for] <i <nand1 < j < k, which implies that (A x B) x C = A * (B x C). o 


On the right hand sides of (2) and (3) in Lemma 4.3 we have not written paren- 
theses, since we will use the common convention that the multiplication of matrices 
binds stronger than the addition. 

For A € R™” we define 


A‘ :=Ax*...xA for kEN, 
— mam 
k times 


A? := L. 


Another multiplicative operation for matrices is the multiplication with a scalar,’ 
which is defined as follows: 


-i Rx R” > R”, (A,A) à- A := lq]. (4.2) 
We easily see that O - A = On „n and 1 - A = A forall A € R™™”. In addition, the 
scalar multiplication has the following properties. 


Lemma 4.4 For A, B € R™”, C € R™* and X, u € R the following assertions 
hold: 


(1) (Ap): A =A: (u- A). 
(2) Atp)-A=A-Atp-A 


3The term “scalar” was introduced in 1845 by Sir William Rowan Hamilton (1805-1865). It origi- 
nates from the Latin word “scale” which means “ladder”. 
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(3) A\-(A+B)=A-At+A-B. 
(4) (A-A)*C=A-(A*XC)=AX(A-C). 


Proof Exercise. o 


The fourth matrix operation that we introduce is the transposition: 
T : R” > Re A= [aij] > A’ = [b], bij = Aji. 


For example, 
14 


A= — eT? Al =! DV5e7 
456 ae 


The matrix A’ is called the transpose of A. 


Definition 4.5 If A € R”” satisfies A = A’, then A is called symmetric. If A = 
—A’, then A is called skew-symmetric. 


For the transposition we have the following properties. 


Lemma 4.6 For A, A € R™”, B € R™ and ÀA € R the following assertions hold: 


(1) (AT)! =A. 

(2) (A+A = AT + AT. 
(3) A-A =A- AF. 
(4) (A x B)! = B7 x A’. 


Proof Properties (1)—(3) are exercises. For the proof of (4) let A x B = [c;;] with 
Ciji = op dikbgj, A’ = [Gj], BT = [bij] and (A x B)’ = [č;;]. Then 


m m m 
Cij = Cji = > ajębki = > Aj dik = > Dik kj: 
k=l k=l k=l 


from which we see that (A x B)’ = BT x AT. o 


MATLAB-Minute. 

Carry out the following commands in order to get used to the matrix operations 
of this chapter in MATLAB notation: A=ones (5,2), A+A, A-3xA, A’, A?’ xA, 
Ax*A?. 

(In order to see MATLAB’s output, do not put a semicolon at the end of the 
command.) 
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Example 4.7 Consider again the example of car insurance premiums from Chap. 1. 
Recall that p;; denotes the probability that a customer in class C; in this year will move 
to the class C;. Our example consists of four such classes, and the 16 probabilities 
can be associated with a row-stochastic 4 x 4 matrix (cp. (1.2)), which we denote by 
P. Suppose that the insurance company has the following distribution of customers 
in the four classes: 40 % in class C1, 30 % in class C2, 20 % in class C3, and 10 % in 
class C4. Then the 1 x 4 matrix 


Po := [0.4, 0.3, 0.2, 0.1] 


describes the initial customer distribution. Using the matrix multiplication we now 
compute 


0.15 0.85 0.00 0.00 
0.15 0.00 0.85 0.00 
0.05 0.10 0.00 0.85 
0.05 0.00 0.10 0.85 


pı := pox P = (0.4, 0.3, 0.2, 0.1] * 


= [0.12; 0.36, 0.2053, 0.255]. 


Then pı contains the distribution of the customers in the next year. As an example, 
consider the entry of po « P in position (1, 4), which is computed by 


0.4 - 0.00 + 0.3 -0.00 + 0.2 -0.85 + 0.1 -0.85 = 0.255. 


A customer in the classes Cı or C2 in this year cannot move to the class C4. Thus, 
the respective initial percentages are multiplied by the probabilities py4 = 0.00 
and p24 = 0.00. A customer in the class C3 or C4 will be in the class C4 with the 
probabilities p34 = 0.85 or p44 = 0.85, respectively. This yields the two products 
0.2 - 0.85 and 0.1 - 0.85. 

Continuing in the same way we obtain after k years the distribution 


Pe i= poxP*, k=0,1,2,.... 


(This formula also holds for k = 0, since P? = 14.) The insurance company can 
use this formula to compute the revenue from the payments of premium rates in the 
coming years. Assume that the full premium rate (class C1) is 500 Euros per year. 
Then the rates in classes C2, C3, and C4 are 450, 400 and 300 Euros (10, 20 and 
40 % discount). If there are 1000 customers initially, then the revenue in the first year 
(in Euros) is 

1000 - (po x [500, 450, 400, 300)’ ) = 445000. 
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If no customer cancels the contract, then this model yields the revenue in year 
k > Qas 


1000 - (pr x [500, 450, 400, 3001”) — 1000- (po x (PF * [500, 450, 400, 300]")) 


For example, the revenue in the next 4 years is 404500, 372025, 347340 and 341819 
(rounded to full Euros). These numbers decrease annually, but the rate of the decrease 
seems to slow down. Does there exists a “stationary state”, i.e., a state when the 
revenue is not changing (significantly) any more? Which properties of the model 
guarantee the existence of such a state? These are important practical questions for 
the insurance company. Only the existence of a stationary state guarantees significant 
revenues in the long-time future. Since the formula depends essentially on the entries 
of the matrix P*, we have reached an interesting problem of Linear Algebra: the 
analysis of the properties of row-stochastic matrices. We will analyze these properties 
in Sect. 8.3. 
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In this section we study algebraic structures that are formed by certain sets of matrices 
and the matrix operations introduced above. We begin with the addition in R™”. 


Theorem 4.8 (R”’”, +) is a commutative group. The neutral element is 0 € R™”™ 
(the zero matrix) and for A = [aij] € R"”" the inverse element is —A := |—aj;] € 
R””. (We write A — B instead of A + (—B).) 


Proof Using the associativity of the addition in R, for arbitrary A, B, C € R™™, we 
obtain 


(A+ B)+C = [aj + bij] + [ci] = aij + bij) + cij] = laij + ij + cij)] 
=[0 F Fal SAEF): 


Thus, the addition in R”’” is associative. 

The zero matrix 0 € R™™” satisfies O + A = [0] + [a;i;] = [0 + a;;] = [a;;] = A. 
For a given A = [a;;] € R”” and —A := [—a;;] € R” we have —A +A 
|—a;;] + la] = [—aiy; + aj] = [0] = 0. 

Finally, the commutativity of the addition in R implies that A+B = [a;;]+[bi;] = 
[a;; + bij] = [bij +a] = B+ A. o 


Note that (2) in Lemma 4.6 implies that the transposition is ahomomorphism (even 
an isomorphism) between the groups (R”’””, +) and (R™”, +) (cp. Definition 3.6). 
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Theorem 4.9 (R””, +, x) is a ring with unit given by the identity matrix l„. This 
ring is commutative only forn = 1. 


Proof We have already shown that (R™”, +) is a commutative group (cp. Theo- 
rem 4.8). The other properties of a ring (associativity, distributivity and the existence 
of a unit element) follow from Lemma 4.3. The commutativity for n = 1 holds 
because of the commutativity of the multiplication in the ring R. The example 


01 . 10| |00 4 Ol; |10 : 01 
00 00| 100 00| 100 00 
shows that the ring R™” is not commutative for n > 2. oO 


The example in the proof of Theorem 4.9 shows that for n > 2 the ring R™” has 
non-trivial zero-divisors, 1.e., there exist matrices A, B € R™” \ {0} with A x B = Q. 
These exist even when R is a field. 

Let us now consider the invertibility of matrices in the ring R™” (with respect to the 
matrix multiplication). For a given matrix A € R™”, an inverse A € R”” must satisfy 
the two equations AxA= I and AxA =l, (cp. Definition 3.10). If an inverse of 
A € R”” exists, i.e., if A is invertible, then the inverse is unique and denoted by A7! 
(cp. Theorem 3.11). An invertible matrix is sometimes called non-singular, while 
a non-invertible matrix is called singular. We will show in Corollary 7.20 that the 
existence of the inverse already is implied by one of the two equations AxA=I, 
and Ax A= L, i.e., if one of them holds, then A is invertible and A~! = A. Until 
then, to be correct, we will have to check the validity of both equations. 

Not all matrices A € R™” are invertible. Simple examples are the non-invertible 
matrices 


= [0] € RI! and A= f o ek, 


Another non-invertible matrix is 


o |11 2.2 
A=|)3/€2 


However, considered as an element of Q%?, the (unique) inverse of A is given by 


1 
AT! = e H eO, 
0 5 


Lemma 4.10 Jf A, B € R”” are invertible, then the following assertions hold: 


(1) A® is invertible with (AT)! = (A7!)*. (We also write this matrix as A~'.) 
(2) A x B is invertible with (A x B)~'! = B7! x AT!. 
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Proof 


(1) Using (4) in Lemma4.6 we have 
(AT) x A’ =(Ax AD) =I =1,=17 =(At A) =A H(A), 


and thus (A~!)? is the inverse of A’. 
(2) This was already shown in Theorem 3.11 for general rings with unit and thus it 
holds, in particular, for the ring (R™”, +, *). o 


Our next result shows that the invertible matrices form a multiplicative group. 


Theorem 4.11 The set of invertible n xn matrices over R forms a group with respect 
to the matrix multiplication. We denote this group by GL, (R) (“GL” abbreviates 
“general linear (group)” ). 


Proof The associativity of the multiplication in GL,(R) is clear. As shown in (2) 
in Lemma4.10, the product of two invertible matrices is an invertible matrix. The 
neutral element in GL, (R) is the identity matrix J,, and since every A € GL, (R) 
is assumed to be invertible, A~! exists with (A71)! = A € GL, (R). o 


We now introduce some important classes of matrices. 
Definition 4.12 Let A = [a;;] € R””. 


(1) A is called upper triangular, if a;; = O for all i > j. 
A is called lower triangular, if a;; = 0 for all j > i (i.e., A” is upper triangular). 
(2) Aiscalled diagonal, ifa;; = Oforalli Æ j (i.e., Ais upper and lower triangular). 
We write a diagonal matrix as A = diag(dj1,..., ann). 


We next investigate these sets of matrices with respect to their group properties, 
beginning with the invertible upper and lower triangular matrices. 


Theorem 4.13 The sets of the invertible upper triangular n x n matrices and of the 
invertible lower triangular n x n matrices over R form subgroups of GL,(R). 


Proof We will only show the result for the upper triangular matrices; the proof for the 
lower triangular matrices is analogous. In order to establish the subgroup property 
we will prove the three properties from Theorem 3.5. 

Since J, is an invertible upper triangular matrix, the set of the invertible upper 
triangular matrices is a nonempty subset of GL,,(R). 

Next we show that for two invertible upper triangular matrices A, B € R™” the 
product C = A x B is again an invertible upper triangular matrix. The invertibility 
of C = [c;;] follows from (2) in Lemma4.10. Fori > j we have 
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Cij = Š airbrj (here Dii = 0 for k > j) 


=> adr; (here aig = 0 for k = 1,..., j, since i > j) 


Therefore, C is upper triangular. 

It remains to prove that the inverse A~! of an invertible upper triangular matrix A 
is an upper triangular matrix. For n = 1 the assertion holds trivially, so we assume 
thatn > 2. Let A7! = [c; j], then the equation A * AT! = I, can be written as a 
system of n equations 


ajg eee din Cij O1j 
el Wel b sein. (4.3) 
0 a 0 Ann Cnj Ong 


Here, 0;; is the Kronecker delta-function defined in (4.1). 
We will now prove inductively fori =n,n—1,..., 1 that the diagonal entry aji 
of A is invertible with aj,’ = c;;, and that 


Ce = az’ (3, = > aeu) J = [; rae ie (4.4) 


f=i+1 


This formula implies, in particular, that cj; = 0 fori > j. 
For i = n the last row of (4.3) is given by 


Gye SO. 7S lal 


For j = n we have annCnn = 1 = CynQnn, Where in the second equation we use the 
commutativity of the multiplication in R. Therefore, ann is invertible with a7! = Cyn, 
and thus 


—] : 
Ca =A a Sess 


This is equivalent to (4.4) for i = n. (Note that for i = n in (4.4) the sum is empty 
and thus equal to zero.) In particular, c,; = O for j = 1,...,n — 1. 

Now assume that our assertion holds for i =n,...,k + 1, wherel<k<n-l. 
Then, in particular, c;; = O fork + 1 < i < n andi > j. In words, the rows 
i = n,...,k + 1 of A`! are in “upper triangular from”. In order to prove the 
assertion for i = k, we consider the kth row in (4.3), which is given by 
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Ace Paracas seem — 0b J = erghh (4.5) 
For j = k (< n) we obtain 
AkkCkk + Ak,k+1Ck+1,k e+ + AknCnk = 1. 
By the induction hypothesis, we have Ck+1,k = +++ = Cn k = 0. This implies akkCkk = 


1 = Ckkākķk, Where we have used the commutativity of the multiplication in R. Hence 
azz is invertible with ae = Ckk. From (4.5) we get 


gaa (Ok; — Ak k+1Ck+1,j — ++» —4knCnj), J =1,...50, 
and hence (4.4) holds fori = k. If k > j, then ôg; = O and Ck41,; = +++ = Crj = O, 
which gives cx; = 0. o 


We point out that (4.4) represents a recursive formula for computing the entries of 
the inverse of an invertible upper triangular matrix. Using this formula the entries are 
computed “from bottom to top” and “from right to left”. This process is sometimes 
called backward substitution. 

In the following we will frequently partition matrices into blocks and make use 


of the block multiplication: For every k € {1,...,n — 1}, we can write A € R™” as 
All A l 
A= | = 2 with Aq eR” and Ax» e R, 
A21|A22 


If A, B € R”” are both partitioned like this, then the product A x B can be evaluated 
blockwise, 1.e., 


A Ae n u Ze] _ A * B11 + Ajo * By} A11 * Bi2 + A12 * a| 
A21|A22 Bo | B2 A21 * By, + An * Ba | A21 * Bi2 + Azz * Boo | 
(4.6) 
o | AnsjAiz2 
a= [ae 


with Aj; E€ GL;(R) and A22 E€ GLy,_;(R), then A € GL, (R) and a direct compu- 


tation shows that i i 1 
A |\=A,, 4A Ax 
ft | 7 1 cT see) | . (4.7) 
22 


In particular, if 
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MATLAB-Minute. 

Create block matrices in MATLAB by carrying out the following commands: 
k=5; 

Aiil=gallery(’tridiag’ ,-ones(k-1,1),2*ones(k,1) ,-ones(k-1,1)); 
A12=zeros(k,2); A12(1,1)=1; A12(2,2)=1; 

A22=-eye(2) ; 

A=full((A11 A12; Al2’ A22]) 

B=full( [A11 A12; zeros(2,k) -A22]) 

Investigate the meaning of the command full. Compute the products AxB 
and BxA as well as the inverses inv(A) and inv(B). Compute the inverse of 
B in MATLAB with the formula (4.7). 


Corollary 4.14 The set of the invertible diagonal n x n matrices over R forms a 
commutative subgroup (with respect to the matrix multiplication) of the invertible 
upper (or lower) triangular n x n matrices over R. 


Proof Since l, is an invertible diagonal matrix, the invertible diagonal n x n matrices 
form a nonempty subset of the invertible upper (or lower) triangular n x n matrices. 
If A = diag(aj1,..., ann) and B = diag(bi,, ..., Bnn) are invertible, then A x B is 
invertible (cp. (2) in Lemma4.10) and diagonal, since 


Ax B = diag(ay1, ag ig) x diag(bı1, kiia ban) = diag(a11b11, ewy Gig Dan) 


Moreover, if A = diag(a11, ..., ann) is invertible, then a;; € R is invertible for 
alli = 1,...,n (cp. the proof of Theorem4.13). The inverse A~! is given by the 
invertible diagonal matrix diaga, ..., a71). Finally, the commutativity property 
A x B = B x A follows directly from the commutativity in R. oO 


Definition 4.15 A matrix P € R”” is called a permutation matrix, if in every row 
and every column of P there is exactly one unit and all other entries are zero. 


The term “permutation” means “exchange”. If a matrix A € R™” is multiplied 
with a permutation matrix from the left or from the right, then its rows or columns, 
respectively, are exchanged (or permuted). For example, if 


001 123 
P-|010|, As |456|e 73, 
100 789 
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then 
789 321 
PxA=|456 and AxP=|654 
123 987 


Theorem 4.16 The set of then x n permutation matrices over R forms a subgroup 
of GL,(R). In particular, if P € R"” is a permutation matrix, then P is invertible 
Wine =P, 


Proof Exercise. Oo 


From now on we will omit the multiplication sign in the matrix multiplication 
and write AB instead of A * B. 


Exercises 
(In the following exercises R is a commutative ring with unit.) 


4.1 Consider the following matrices over Z: 


2 4 
1-2 4 —1 0 
a= ap — s c=| 1 


Determine, if possible, the matrices CA, BC, BTA, ATC, (~A)! C, B’ A’, 
AC and CB. 
4.2 Consider the matrices 


X1 
A = [aij] € R”, x=|: ER"! y = [y1,..., Ym] E€ R”. 


Xn 


Which of the following expressions are well defined for m Æ n or m = n? 


(a) xy, (b)xfy, (c)yx, (d) yx’, (e)xAy, (f) x! Ay, 
(g) xAy’, (h) x” Ay’, (i) xyA, (j) xyA* , (K) Axy, (I) A’ xy. 


4.3 Show the following computational rules: 
H1xı + p2x2 = [x1, x2] H and A[x), x2] = [Ax1, Ax2] 


for A € R™”, xua € R™! andui pio € R. 


4.2 
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4.4 Prove Lemma4.3 (2)-(4). 
4.5 Prove Lemma4.4. 
4.6 Prove Lemma4.6 (1)-(3). 


O11 


47 Let A= |001| € Z°. Determine A” for all n € N U {0}. 


000 


4.8 Let p = ant” +... + ait + aot? € R[t] be a polynomial (cp. Example 3.17) 


4.9 


4.10 


4.11 


4.12 


4.13 


4.14 


and A € R™™”. We define p(A) € R™” as p(A) := an A” +. ..+@1 A + Qom. 


: A e 

(b) For a fixed matrix A € R™” consider the map fa : R[t] > R™”, p => 
p(A). Show that fa(p +4) = fa(p) + falqa) and fa(pa) = fa(P) fala) 
for all p,q € R[t]. 
(The map fa is a ring homomorphism between the rings R[t] and R”’””.) 

(c) Show that f4(R[t]) = {p(A) | p € R[t]} is a commutative subring of R™”, 
i.e., that f4(R[t]) is a subring of R™™ (cp. Exercise 3.14) and that the 
multiplication in this subring is commutative. 

(d) Is the map f, surjective? 


(a) Determine p(A) for p = t? — 2t + 1 € Z[t] and A = 


Let K be a field with 1 + 1 Æ 0. Show that every matrix A € K”” can be 
written as A = M + S with a symmetric matrix M € K"” (i.e, MT = M) 
and a skew-symmetric matrix S$ € K™” (i.e., ST = — 8S). 

Does this also hold in a field with 1 + 1 = 0? Give a proof or a counterexample. 


Show the binomial formula for commuting matrices: If A, B e R™” with 
k k i pk-j k\, k! 
AB = BA, then (A+ B} = Jo (£) AÏ Bk-i, where (5) = ty. 


Let A e R”” be a matrix for which J, — A is invertible. Show that (I, — 
A) h= Ats D9 A” holds for every m € N. 

Let A € R”” be a matrix for which an m € N with A” = J, exists and let m 
be smallest natural number with this property. 


(a) Investigate whether A is invertible, and if so, give a particularly simple 
representation of the inverse. 
(b) Determine the cardinality of the set {A* | k € N}. 


LitA=sda le R lag =O0for7 S ck 


(a) Show that A is a subring of R””. 

(b) Show that AM e A forall M e R”” and A € A. 
(A subring with this property is called a left ideal of R”’”.) 

(c) Determine an analogous subring 6 of R””, such that MB e B for all 
M e R”” and B € B. 
(A subring with this property is called a left ideal of R™”.) 


Examine whether (G, x) with 
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4.15 
4.16 
4.17 


4.18 


4.19 


4.20 
4.21 


4.22 


4 Matrices 
G= cos(a) — sin(a) veR 
sin(a) cos(a) 
is a subgroup of GL (R). 
Generalize the block multiplication (4.6) to matrices A € R”” and B € R™*., 


Determine all invertible upper triangular matrices A € R»” with A~! = A’. 
Let Ay; € Ra: A2 € Rz, A21 € Rem, An € R”»”2 and 


Ay, A12 
A= £ puani tN | 
im A22 


(a) Let Ay; E€ GLa, (R). Show that A is invertible if and only if Az2 — 
Ay Aj, An is invertible and derive in this case a formula for AT}. 

(b) Let An € GLa (R). Show that A is invertible if and only if Ay, — 
A1245 Art is invertible and derive in this case a formula for A~!. 


Let A € GL, (R), U € R™™ and V € R™”. Show the following assertions: 
(a) A+UV € GL,(R) holds if and only if In + VA™!U € GLn(R). 
(b) If In + VAT'U € GLm(R), then 
(A+ UV)! = A7! =A “UG + VATU) vA. 
(This last equation is called the Sherman-Morrison- Woodbury formula; 
named after Jack Sherman, Winifred J. Morrison and Max A. Woodbury.) 


Show that the set of block upper triangular matrices with invertible 2 x 2 
diagonal blocks, 1.e., the set of matrices 


A11 A12 ++: Aim 
QO An: Åm 
i . i š ’ Ai E€ GL2(R), i E 

0 «2 O Ann 
is a group with respect to the matrix multiplication. 


Prove Theorem 4.16. Is the group of permutation matrices commutative? 
Show that the following is an equivalence relation on R™”: 


A~ B & There exists a permutation matrix P with A = P’ BP. 


A company produces from four raw materials R1, R2, R3, R4 five intermediate 
products Z1, Z2, Z3, Z4, Zs, and from these three final products FE, E2, E3. The 
following tables show how many units of R; and Z; are required for producing 
one unit of Z; and E,, respectively: 
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Zı Z2 23 Z4 £5 





For instance, five units of Rz and one unit of R3 are required for producing one 
unit of Z1. 


(a) Determine, with the help of matrix operations, a corresponding table which 
shows how many units of R; are required for producing one unit of Eg. 

(b) Determine how many units of the four raw materials are required for pro- 
ducing 100 units of £1, 200 units of E and 300 units of E3. 


Chapter 5 
The Echelon Form and the Rank of Matrices 


In this chapter we develop a systematic method for transforming a matrix A with 
entries from a field into a special form which is called the echelon form of A. The 
transformation consists of a sequence of multiplications of A from the left by certain 
“elementary matrices”. If A is invertible, then its echelon form is the identity matrix, 
and the inverse A~! is the product of the inverses of the elementary matrices. For a 
non-invertible matrix its echelon form is, in some sense, the “closest possible” matrix 
to the identity matrix. This form motivates the concept of the rank of a matrix, which 
we introduce in this chapter and will use frequently later on. 


5.1 Elementary Matrices 


Let R be a commutative ring with unit, n € N andi, j € {1,..., n}. Let J, € R” 
be the identity matrix and let e; be its ith column, i.e., J, = [e1,..., ey]. 
We define 
Epee = [Osh ĉi ,0,...,0] E€ R””, 
a 
column j 


i.e., the entry (i, j) of Ei; is 1, all other entries are 0. 
Forn > 2 andi < j we define 


ME n,n 
Pi; = COE E CA es to E a cc | (Sk ; (5.1) 


Thus, P;; is a permutation matrix (cp. Definition 4.12) obtained by exchanging the 
columns i and j of /,. A multiplication ofA € R”” from the left with P;; means an 
exchange ofthe rows i and j of A. For example, 
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123 001 789 
A=|456 3 Pi3 = [e3, e2, e1] = 010 ; PRA = 456 
189 100 123 


For A € R we define 
M;(A) := [e1, ..., €i—1, AG, Gi41,--->€n] E€ R”. (5.2) 
Thus, M; (A) is a diagonal matrix obtained by replacing the ith column of J, by Ae;. 


A multiplication of A € R™” from the left with M; (A) means a multiplication of the 
ith row of A by A. For example, 


123 1 00 1 2 3 
A=|456|, Mo(-1) = [e1, —e2,e3]= | 0-10], M(—1)A = | —4 —5 —6 |. 
789 0 01 7 8 9 


Forn > 2,i < j and A € R we define 


CGA = Ir PAL = léiss Gat? FAE eris Cy E R (353) 


Thus, the lower triangular matrix G;;(A) is obtained by replacing the ith column of 
I, by e; + Ae;. A multiplication of A € R”” from the left with G;; (A) means that 
A times the ith row of A is added to the jth row of A. Similarly, a multiplication of 
A € R””™” from the left by the upper triangular matrix G;;(A)’ means that À times 
the jth row of A is added to the ith row of A. For example, 


123 1 00 
A=1|1456 A G3(—1) = [e], e2 — e3, 63] = 0 10 3 
789 0—11 
123 1 2 3 
Gu(-NA=|1456], G»3(-1l’A = | -3 -3 -3 
333 7 8 9 


Lemma 5.1 The elementary matrices Pij, M;(A) for invertible A € R, and Gi; (A) 
defined in (5.1), (5.2), and (5.3), respectively, are invertible and have the following 
inverses: 


Oe ai Sr: 

(2) MA)! =M OA] 
(3) Gi)! = Gy (~A). 
Proof 


(1) The invertibility of P;; with p = Pi was already shown in Theorem 4.16; 
the symmetry of P;; is easily seen. 
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(2) Since À € R is invertible, the matrix M;(A~') is well defined. A straightforward 
computation now shows that M;(A7!)Mj (A) = M; (X) M; (A7}) = In. 
(3) Since ea = 0 fori < j, we have E, — (e; e Nee) = 0, and therefore 
Ga AJG AAA) =U, FAE DG +A En) 
= l, + Eji + (-AVEji + (-M) EG, = h. 


A similar computation shows that G;;(—A)Gij(A) = In. o 


5.2 The Echelon Form and Gaussian Elimination 


The constructive proof of the following theorem relies on the Gaussian elimination 
al gorithm.! For a given matrix A € K™” ,where K is afield, this algorithm constructs 
a matrix S € GL,(K) such that SA = C is quasi-upper triangular. We obtain this 
special form by left-multiplication of A with elementary matrices P;;, M;i; (A) and 
G;;(A). Each of these left-multiplications corresponds to the application of one of 
the so-called “elementary row operations” to the matrix A: 


e P;;: exchange two rows of A. 
e M;(A): multiply a row of A with an invertible scalar. 
e G;;(A): add a multiple of one row of A to another row of A. 


We assume that the entries of A are ina field (rather than a ring) because in the proof 
of the theorem we require that nonzero entries of A are invertible. A generalization of 
the result which holds over certain rings (e.g. the integers Z) is given by the Hermite 
normal form,” which plays an important role in Number Theory. 


Theorem 5.2 Let K beafieldandlet A € K™”. Then there exist invertible matrices 
Si,...,58; E K”” (these are products of elementary matrices) such that C := 
S; +++ $,A is in echelon form, i.e., either C = 0 or 


'Named after Carl Friedrich Gauß (1777-1855). A similar method was already described in Chap. 8, 
“Rectangular Arrays”, of the “Nine Chapters on the Mathematical Art”. This text developed in 
ancient China over several decades BC stated problems of every day life and gave practical math- 


ematical solution methods. A detailed commentary and analysis was written by Liu Hui (approx. 
220-280 AD) around 260 AD. 


2Charles Hermite (1822-1901). 
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Here x denotes an arbitrary (zero or nonzero) entry of C. 

More precisely, C = [cij] is either the zero matrix, or there exists a sequence of 
natural numbers j,,..., jy (these are called the “steps” of the echelon form), where 
1< jy <- <j, <mand1 <r < min{n, m}, such that 

(1) G¢=Ojorl=tsrand1< 7 = jJi 

(2) cj =Oforr <i<nand\1<j <m, 

(3) ci = 1 for 1 <i <r and all other entries in column jj are zero. 

Ifn = m, then A € K”” is invertible if and only if C = I,. In this case A™! = 
S. 2239). 


Proof If A =O, then we set t = 1, S$; = I, C = 0 and we are done. 
Now let A Æ 0 and let jı be the index of the first column of 


M a ee 
A = G | ‘= Á 
that does not consist of all zeros. Let a be the first entry in this column that is 


nonzero, i.e., A® has the form 


0 


AW = 0 {av * 


Ji 
We then proceed as follows: First we permute the rows i; and | (Gf i1 > 1). Then we 
=| 
Nee ) . Finally we eliminate 
1; J1 


the nonzero elements below the first entry in column jı. Permuting and normalizing 
leads to 


normalize the new first row, i.e., we multiply it with (a 
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Pe —] d> 
1 Pee | 1 1 Ji 
A! ) — kd — Mı (C) ) Pii A ) = 0 , * 


If i; = 1, then we set P;,; := /,. In order to eliminate below the 1 in column jı, we 
multiply A“ from the left with the matrices 


~(1 ~(1 
Gin (a) ees G12 (a) : 


Then we have 





Ji 
where À 
Si = Gin (-48,) = Gia (a?) m (2) ) Pri 
and A?) = ai with i = 2,...,n, j = ji + 1, ...,m, i.e., we keep the indices of 


the larger matrix A“? in the smaller matrix A®. 

If A® = [ ] or A® = O, then we are finished, since then C := S1 A® is in 
echelon form. In this case r = 1. 

If at least one of the entries of A®” is nonzero, then we apply the steps described 
above to the matrix A. For k = 2,3,... we define the matrices 5S; recursively as 





_ |0 S4® — 
s=] 0 5 | where SA] = 


A+) 


Jk 
Each matrix S; is constructed analogous to $1: First we identify the first column jg 


of A that is not completely zero, as well as the first nonzero entry a in that 
column. Then permuting and normalizing yields the matrix 


Se —] 
AW = [a] :—= M; (0) ) P, A®. 
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If k = ix, then we set Pk k := In_p41. Now 


—] 
ra ~(k ~(k k 
a a a (8) ) Pri. 


so that S% is indeed a product of elementary matrices of the form 


Iki 0 
QO T’ 
where T is an elementary matrix of size (n — k +1) x (n — k + 1). 
If we continue this procedure inductively, it will end after r < min{n, m} steps 


with either A®+D = 0 or AUT) = [ J. 
After r steps we have 


S,- -- SAM = 


(5.4) 





By construction, the entries 1 in (5.4) are in the positions 


(l,j), Q, j2), C, jr). 


If r = 1, then SA” is in echelon form (see the discussion at the beginning of 
the proof). Ifr > 1, then we still have to eliminate the nonzero entries above the 1 
in columns jz,..., j,. To do this, we denote the matrix in (5.4) by R® = Ca and 
form for k = 2, ... , r recursively 


k —] 
RP = Ga = Sua i, 


where P . 
k—1 k-1 
Srek-1 = G1 x (=i ') ++ Gk—1,k (-ri53,) 


For t := 2r — 1 we have C := S,S;_1 ---S,A in echelon form. 

Suppose now that n = m and that C = S,S,_;---5;,A is in echelon form. If A is 
invertible, then C is a product of invertible matrices and thus invertible. An invertible 
matrix cannot have a row containing only zeros, so that r = n and hence C = H. 
If, on the other hand, C = [,, then the invertibility of the elementary matrices 
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implies that S;'---S~! = A. As a product of invertible matrices, A is invertible and 
AT! = S,--- Si. o 

In the literature, the echelon form is sometimes called reduced row echelon form. 


Example 5.3 Transformation of a matrix from Q?* to echelon form via left multi- 
plication with elementary matrices: 


02133 
02011 
02011 
j=2,i=1 [Ol15 35 EEN oli + 3 3 
— 0/2011 o2 0 1 1 
G13(—2 
Mı (5) 02011 s2 | glo -1 -2-2 





G23 (1) 


O ma © 
O © © 





MATLAB-Minute. 

The echelon form is computed in MATLAB with the command 
rref (“reduced row echelon form’). Apply rref to [A eye(n+1)] in 
order to compute the inverse of the matrix A=full (gallery (’tridiag’ , 
-ones(n,1),2xones(n+1,1),-ones(n,1))) forn=1,2,3,4,5 (cp. Exer- 
cise 3.3). 

Formulate a conjecture about the general form of A~!. (Can you prove your 
conjecture?) 


The proof of Theorem 5.2 leads to the so-called LU -decomposition of a square 
matrix. 


Theorem 5.4 For every matrix A € K"", there exists a permutation matrix P € 
K™”, a lower triangular matrix L € GL,(K) with ones on the diagonal and an 
upper triangular matrix U € K™”, such that A = PLU. The matrix U is invertible 
if and only if A is invertible. 
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Proof For A € K”” the Eq. (5.4) has the form S, ---$,;A = U, where U is upper 


triangular. If r < n, then we set S, = n- = = Spe = Ih. Since the matrices 
S;,..., Sn are invertible, it follows that U is invertible if and only if A is invertible. 
For i = 1,...,n every matrix S; has the form 
l 
1 
S; = Sii Fy is 

Sitti l 

Sn,i 1 
where j; > i fori = 1,...,n and Pii := I, Gf j; = i, then no permutation was 


necessary). Therefore, 


1 1 
Í Sn—1,n—1 
Sn,n Sn n—l1 1 
1 
1 S11 
522 s21 1 
s32 1 | 531 1 , 
I Pn—2, jn” P2 jy Pi, i, 
Sn—2,n—2 A , i 
en n3 1 l l 
i 1 1 
ee, 0 1 Sn 2 Sn] 
The form of the permutation matrices fork = 2,...,n — 1 and £ = 1,...,k— 1 
implies that 
1 1 
1 1 
Pr, jz See = See Pr, jz 
Se+1¢ 1 Se+1,€ 1 
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holds for certain s; E€ K, j =€+1,...,n. Hence, 
S,- S1 = 
1 
1 
l 1 
1 
Sn—2,n—2 
Sn—1,n—1 1 
Sn—1,n—2 
SnnSn,n—1 Sn,n ~ 1 
Sn n—2 
1 S11 
S22 S21 1 
530 l S: 1 
32 31 Patina tt? Pi 
512 1J | Sn 1 


The invertible lower triangular matrices and the permutation matrices form groups 
with respect to the matrix multiplication (cp. Theorems 4.13 and 4.16). Thus, 
SaS S= LP, where l is invertible and lower triangular, and Pa is a permuta- 
tion matrix. Since L = [l jļis invertible, also D := = diag(lı Hirini ya is invertible, 
and we obtain A = PLU with P := P~! = PT 1 :=T—'DandU =D lU. By 
construction, all diagonal entries of L are equal to one. o 


Example 5.5 Computation of an LU -decomposition of a matrix from Q°”: 


224 
2i 
201 
(oa = [112 EEN 1 1 2 
_ 221 Gia(—2) > tt 
M; (3) 201 7 0-2-3 
_ 112] | 1 1 2 7 
Go?) 0 0-3 Py 0—2-3|=U. 
0-9-3 0 0-3 
Hence, P = P23, 
B i +00 1 
L= G12(—2)G 13(—2) My, (;) = —2 10 ; D= diag (5 l, i) ; 
2 -211 2 


and thus, P = PT = PI = P», 


64 5 The Echelon Form and the Rank of Matrices 


7 100 B 22 A 
L=L'D=]110|, U= DU = | 0-2-3 
101 0 0-3 


If A € GL,(K), then the LU-decomposition yields A~' = U~'L~!P’. Hence 
after computing the LU-decomposition, one obtains the inverse of A essentially by 
inverting the two triangular matrices. Since this can be achieved by the efficient 
recursive formula (4.4), the LU-decomposition is a popular method in scientific 
computing applications that require the inversion of matrices or the solution of linear 
systems of equations (cp. Chap.6). In this context, however, alternative strategies 
for the choice of the permutation matrices are used. For example, instead of the first 
nonzero entry in a column one chooses an entry with large (or largest) absolute value 
for the row exchange and the subsequent elimination. By this strategy the influence 
of rounding errors in the computation is reduced. 


MATLAB-Minute. 

The Hilbert matrix? A = Goa OVO MAS (ne Canes Gee = I apy = IN), 
for i,j = 1,...,n. It can be generated in MATLAB with the command 
hilb(n). Carry out the command [L,U,P]=lu(hilb(4) ) in order to com- 
pute an LU-decomposition of the matrix hilb(4). How do the matrices P, L 
and U look like? 

Compute also the LU-decomposition of the matrix 
full(gallery(’tridiag’ ,-ones(3,1),2xones (4,1) ,-ones(3,1))) 
and study the corresponding matrices P, L and U. 


We will now show that, for a given matrix A, the matrix C in Theorem 5.2 is 
uniquely determined in a certain sense. For this we need the following definition. 


Definition 5.6 IfC € K™” isin echelon form (as in Theorem 5.2), then the positions 
of (1, j1),..., (r, j+) are called the pivot positions of C. 


We also need the following results. 
Lemma 5.7 If Z € GL,(K) and x € K™!, then Zx = 0 if and only if x = 0. 


Proof Exercise. o 


Theorem 5.8 Let A, B € K™” be in echelon form. If A = ZB for a matrix Z € 
GL, (K), then A = B. 


3David Hilbert (1862-1943). 


5.2. The Echelon Form and Gaussian Elimination 65 


Proof If B is the zero matrix, then A = ZB = O, and hence A = B. 

Let now B Æ 0 and let A, B have the respective columns a;, bi, 1 <i < m. 
Furthermore, let (1, j1), ..., (r, j,) be the r > 1 pivot positions of B. We will show 
that every matrix Z € GL (K) with A = ZB has the form 


I| x 
z= [azm 
where Z,_, € GL,_,(K). Since B is in echelon form and all entries of B below its 
row r are zero, it then follows that B = ZB = A. 

Since (1, jı) is the first pivot position of B, we have b; = 0 € K”! for 1 <i < 
ji — land b;, = e; (the first column of J,). Then A = ZB implies a; = 0 € K™! 
for 1 <i < jı — landa; = Zb; = Ze. Since Z is invertible, Lemma 5.7 implies 
thata; AOE K nl. Since A is in echelon form, a ji = 41 = b- Furthermore, 


7-7. — l|) x 
— mo emm 0 Zn-1 ’ 


where Z,_; € GL,_;(K) (cp. Exercise 5.3). If r = 1, then we are done. 

If r > 1, then we proceed with the other pivot positions in an analogous way: 
Since B is in echelon form, the kth pivot position gives b;, = eg. From a; = Zb; 
and the invertibility of Z,_,41 we obtain a;, = b;, and 





where Z,-~% E€ GLn—(K). o 


This result yields the uniqueness of the echelon form of a matrix and its invariance 
under left-multiplication with invertible matrices. 


Corollary 5.9 For A € K"” the following assertions hold: 


(1) There is a unique matrix C € K"" in echelon form to which A can be trans- 
formed by elementary row operations, i.e., by left-multiplication with elementary 
matrices. This matrix C is called the echelon form of A. 

(2) fM € GL,(K), then the matrix C in (1) is also the echelon form of M A, i.e., 
the echelon form of a matrix is invariant under left-multiplication with invertible 
matrices. 


Proof 


(1) If SA = Cı and S&A = Co, where C1, C2 are in echelon form and $1, S2 are 
invertible, then C; = (Sı S) C2. Theorem 5.8 now gives Cy = Co. 

(2) IfM e GL,(K) and $3(MA) = C; is in echelon form, then with $;A = Ci 
from (1) we get C3 = (S3M ST ') C1. Theorem 5.8 now gives C3 = C). o 
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5.3 Rank and Equivalence of Matrices 


As we have seen in Corollary 5.9, the echelon form of A e K™™ is unique. In 
particular, for every matrix A € K™” , there exists a unique number of pivot positions 
(cp. Definition 5.6) in its echelon form. This justifies the following definition. 


Definition 5.10 The number r of pivot positions in the echelon form of A e K”” 
is called the rank* of A and denoted by rank(A). 


We see immediately that for A € K™” always O < rank(A) < min{n, m}, where 
rank(A) = 0 if and only if A = 0. Moreover, Theorem 5.2 shows that A € K™” is 
invertible if and only if rank(A) = n. Further properties of the rank are summarized 
in the following theorem. 


Theorem 5.11 For A € K™” the following assertions hold: 
(1) There exist matrices Q € GL,(K) and Z € GL,,(K) with 


L, O, m-r | 


n—r,r Opi 


oaz =| 4 


if and only if rank(A) =r. 
(2) If Q € GL,(K) and Z € GL, (K), then rank(A) = rank(Q A Z). 
(3) If A= BC with B € K"* and C € K+”, then 


(a) rank(A) < rank(B), 
(b) rank(A) < rank(C). 


(4) rank(A) = rank(A‘). 
(5) There exist matrices B € K”! and C € K*™ with A = BC if and only if 
rank(A) < £. 


Proof 


(3a) Let Q € GL,(K) be such that QB is in echelon form. Then QA = OBC. 
In the matrix Q BC at most the first rank (B) rows contain nonzero entries. By 
Corollary 5.9, the echelon form of QA is equal to the echelon form of A. Thus, 
in the normal echelon form of A also at most the first rank(B) rows will be 
nonzero, which implies rank(A) < rank(B). 

<: If rank(A) = r = 0,1.e., A = O, then J, = [ | and the assertion holds for 
arbitrary matrices Q € GL, (K) and Z € GL (K). 

If r > 1, then there exists a matrix Q € GL, (K) such that QA is in echelon 
form with r pivot positions. Then there exists a permutation matrix P e K™™”, 
that is a product of elementary permutation matrices P;;, with 


(1 


— 


“The concept of the rank was introduced (in the context of bilinear forms) first in 1879 by Ferdinand 
Georg Frobenius (1849-1917). 
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(2 


(4 


Ne 


Ne 


I Uas 
ToT __ r r,n—r 
on Q E = Oner | 


for some matrix V e K” ™". If r = m, then V = [ ]. In the following, for 
simplicity, we omit the sizes of the zero matrices. The matrix 


L 0 — 
yY = k P e K 


is invertible with 


y! = z Ai | ek". 


Thus, 
ToT _ L, 0 
YPATOQ! = Į ae 
and with Z := PYT € GL,,(K) we obtain 


OAZE f o (5.5) 


=>: Suppose that (5.5) holds for A € K”’” and matrices Q € GL,,(K) and 
Z €GL,,(K ). Then with (3a) we obtain 


rank(A) = rank(AZZ~!) < rank(AZ) < rank(A), 
and thus, in particular, rank(A) = rank(AZ). Due to the invariance of the 


echelon form (and hence the rank) under left-multiplication with invertible 
matrices (cp. Corollary 5.9), we get 


rank(A) = rank(AZ) = rank(QAZ) = rank a 51) Sj: 


If A e K”"*", Q € GL, (K) and Z € GLm(K), then the invariance of the rank 
under left-multiplication with invertible matrices and (3a) can again be used 
for showing that 


rank(A) = rank(QAZZ~!) < rank(QAZ) = rank(AZ) < rank(A), 


and hence, in particular, rank(A) = rank(Q A Z). 


If rank(A) = r, then by (1) there exist matrices Q € GL,(K) and Z € 
GLiy(K) with QAZ = i i Therefore, 
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Haka ene) S| Vk roy’) _ k((QAZ)! 
rank(A) = rank(QAZ) = ran o o|] = 00 = rank((QA ZY ) 


— rank(Z! A’ Q7) = rank (A7 ). 


(3b) Using (3a) and (4), we obtain 
rank(A) = rank(A’) = rank(C’B’ ) < rank(C’ ) = rank (C). 
(5) Let A = BC with B € K"*,C e K*”. Then by (3a), 
rank(A) = rank(BC) < rank(B) < £. 


Let, on the other hand, rank(A) = r < £. Then there exist matrices Q € 


GL, (K) and Z € GL,,(K) with QAZ = t A Thus, we obtain 
— —1 L, 0, e-r L, O,m—r —-1}) _., 
a= (o pa On—re—r Og—rr Op—+m—r á o oe 
where B € K”! and C e K”, o 


Example 5.12 The matrix 


02133 
A=]|02011| cQ 
02011 


from Example 5.3 has the echelon form 





Since there are two pivot positions, we have rank (A) = 2. Multiplying A from the 
right by 
100 0 0 
000 0 0 
B=|000 0 O0/eQ?, 
000-1 —1 
000-1 —1 


yields AB = 0 € Q5, and hence rank(AB) = 0 < rank(A). 


Assertion (1) in Theorem 5.11 motivates the following definition. 
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Definition 5.13 Two matrices A, B € K™™ are called equivalent, if there exist 
matrices Q € GL, (K) and Z € GL,, (K) with A = OBZ. 


As the name suggests, this defines an equivalence relation on the set K™™”, since 
the following properties hold: 


e Reflexivity: A= QAZ with Q = I, and Z = Ín. 
e Symmetry: If A = OBZ, then B = Q7'AZ™!. 
e Transitivity: If A = Qı B Zı and B = QoC Z2, then A = (Q1 Q2)C(Z2Z1). 


The equivalence class of A € K™™ is given by 
[A] = {QAZ|Q €GL,(K) and Z € GL,,(K)}. 
If rank(A) = r, then by (1) in Theorem 5.11 we have 
L, 0r m-r — L, 0 
on, | - k o| SA 


and, therefore, 


Consequently, the rank of A fully determines the equivalence class [A]. The matrix 


[, 0 n,m 
Į 4 EK 


is called the equivalence normal form of A. We obtain 


min{n,m} 
Ke? = U in 4 | , where 


r=0 


KIRI] wee 


Hence there are 1 + min{n, m} pairwise distinct equivalence classes, and 


I, 0 A 

[Loo] © 
is a complete set of representatives. 
From the proof of Theorem 4.9 we know that (K™”, +, *) for n > 2 is a non- 


commutative ring with unit that contains non-trivial zero divisors. Using the equiv- 
alence normal form these can be characterized as follows: 





P=: L- minf. m)| 


e If A € K”” is invertible, then A cannot be a zero divisor, since then AB = 0 
implies that B = 0. 
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e If A € K”” \ {0} is a zero divisor, then A cannot be invertible, and hence 1 < 
rank(A) = r < n, so that the equivalence normal form of A is not the identity 
matrix [,. Let Q, Z € GL, (K) be given with 


1,0 
oaz=| 00) 


Then for every matrix 


and B := ZV we have 
— —1 L, 0 0,7 Oy n—r — 
Ne p O} [V2 Vz = 
If V Æ 0, then B Æ 0, since Z is invertible. 


Exercises 
(In the following exercises K is an arbitrary field.) 


5.1 Compute the echelon forms of the matrices 


li -i0 
7123 24 fli 29 {00 Oi 44 
A=|7449/€°° B=|;;/€c ¢ ee oo 
01 00 
10 1020 
D= |11| €(Z/2Z)*, E=|2011] € (Z/3Z)*". 
01 1202 


(Here for simplicity the elements of Z/nZ are denoted by k instead of [k].) 
State the elementary matrices that carry out the transformations. If one of the 
matrices is invertible, then compute its inverse as a product of the elementary 
matrices. 


5.2 Let A = be ‘ | € K** with aô Æ By. Determine the echelon form of A and 


a formula for A~!. 


_ |l Axr 
5.3 Let A = 0 B 


A €GL,(K) if and only if B € GL,_|(K). 
5.4 Consider the matrix 


ce K”” with Ap e K'"—! and B e K"—!"—!, Show that 


t+1 t+ 


t+1 t-1 
A=|‘al ia | EKO, 


5.3 Rank and Equivalence of Matrices 71 


5.9 


5.6 


a7 
5.8 


3.9 


5.10 


5.11 


Id 


where K (t) is the field of rational functions (cp. Exercise 3.19). Examine 
whether A is invertible and determine, if possible, A~!. Verify your result by 
computing A~'A and AA™!. 

Show that if A € GL, (K), then the echelon form of [A, J,] € K”*” is given 
by [h ATH]: 

(The inverse of an invertible matrix A can thus be computed via the transfor- 
mation of [A, J,,] to its echelon form.) 

Two matrices A, B € K™™ are called left equivalent, if there exists a matrix 
Q € GL,(K) with A = QB. Show that this defines an equivalence relation on 
K™” and determine a most simple representative for each equivalence class. 
Prove Lemma 5.7. 

Determine LU -decompositions (cp. Theorem 5.4) of the matrices 


1230 2 0-2 0 

|4001 o |-4 0 4-1 M 

A= tagean ®=| o-1-1-2| © ® 
0100 0011 


If one of these matrices is invertible, then determine its inverse using its LU- 
decomposition. 

Let A be the 4 x 4 Hilbert matrix (cp. the MATLAB-Minute above Defini- 
tion 5.6). Determine rank(A). Does A have an LU-decomposition as in The- 
orem 5.4 with P = 14? 

Determine the rank of the matrix 


0 aß 
A=|]-a 0y7|eR*” 
=e) 


in dependence of a, 3, y € R. 
Let A, B € K”” be given. Show that 


rank(A) + rank(B) < rank ii A) 


for all C € K™”. Examine when this inequality is strict. 
Let a, b, c € R”! 


(a) Determine rank (ba? ). 
(b) Let M (a, b) := ba’ — ab’. Show the following assertions: 
G) M(a,b) = —M(b,a) and M (a, b)c + M(b,c)a + M(c, a)b = Q, 
Gi) M(Aa + ub, c) = AM (a,c) + uM (b, c) for A, u € R, 
(iii) rank(M (a, b)) = 0 if and only if there exist A, uw € R with A Æ 0 or 
u Æ 0 and Aa + ub = 0, 
(iv) rank(M (a, b)) € {0, 2}. 


Chapter 6 
Linear Systems of Equations 


Solving linear systems of equations is a central problem of Linear Algebra that 
we discuss in an introductory way in this chapter. Such systems arise in numerous 
applications from engineering to the natural and social sciences. Major sources of 
linear systems of equations are the discretization of differential equations and the 
linearization of nonlinear equations. In this chapter we analyze the solution sets of 
linear systems of equations and we characterize the number of solutions using the 
echelon form from Chap. 5. We also develop an algorithm for the computation of the 
solutions. 


Definition 6.1 A linear system (of equations) over a field K with n equations in m 
unknowns x1, ..., Xm has the form 


Anki Tass Fainn — Os 


Aii +...+ Onm = bo, 


AniX, +... F AnmXm = Dy 


or 
Ax =b, 


where the coefficient matrix A = [aij] € K”” and the right hand side b = [bj] € 
K™! are given. If b = 0, then the linear system is called homogeneous, otherwise 
non-homogeneous. Every x € K™! with Ax = b is called a solution of the linear 
system. All these x form the solution set of the linear system, which we denote by 


L(A, b). 


The next result characterizes the solution set (A, b) of the linear system Ax = b 
using the solution set -X (A, 0) of the associated homogeneous linear system Ax = 0. 
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Lemma 6.2 Let A € K”"” and b e K™®! with Y(A,b) Æ Ø be given. If X € 
L(A, b), then 


L(A, b) =£ + L(A,0) := {L +7Z7|Z € L(A, 0)}. 
Proof Ifz € (A, 0), and thus x +Z € x + L(A, 0), then 
A(x +z) = AX + AZ =b +0 =b. 


Hence x +z € Z (A, b), which shows that x + Z (A, 0) C L(A, b). 
Let now x, € Z (A, b) and let Z := x, — x. Then 


AZ = Ax, — Ax =b—b=0, 


i.e., Z € L(A, 0). Hence x} =x +z € xX +-Z(A, 0), which shows that Z (A, b) C 
x+ L(A, 0). O 


We will have a closer look at the set (A, 0): Clearly, 0 € Z (A,0) Æ Ø. If 
Z € Z(A, 0), then for all A € K we have A(\Z) = A(AD = à - 0 = 0, and hence 
XZ € L(A, 0). Furthermore, for Z1, Z2 € -Z (A, 0) we have 


A(Z, +22) = Az] + AZ2 =0+0=0, 


and hence Z + Z € “L(A, 0). Thus, (A, 0) is a nonempty subset of K”:! that is 
closed under scalar multiplication and addition. 


Lemma 6.3 If A € K"”, b € K”! and S € K””, then Z(A, b) C L(SA, Sb). 
Moreover, if S is invertible, then L (A, b) = L (SA, Sb). 


Proof If x € Z(A, b), then also SAx = Sb, and thus x € (SA, Sb), which 
shows that (A, b) C Y(SA, Sb). If S is invertible and y € &(SA, Sb), then 
SAy = Sb. Multiplying from the left with S~! yields Ay = b. Since Y € L(A, b), 
we have Z (SA, Sb) C L(A, dD). C 


Consider the linear system of equations Ax = b. By Theorem 5.2 we can find 
a matrix S € GL, (K) such that SA is in echelon form. Let b = [b;] = Sb, then 
L(A,b) = L(SA,b) by Lemma6.3, and the linear system SAx = b takes the 
form 
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Suppose that rank(A) = r, and let j1, jo,..., Jr be the pivot columns. Using a right- 
multiplication with a permutation matrix we can move the r pivot columns of SA to 
the first r columns. This is achieved by 


T . m,m 
PO BOF ge ay E jhe © yds nop pte EET Cpe S E , 


which yields _ 
A= sar" =| yo An | 


On—rr On—rm—r 


where Ap e K”. Ifr = m, then we have Aas = [ ]. This permutation leads to 
a simplification of the following presentation, but it is usually omitted in practical 
computations. 

Since PTP = I, we can write SAx = b as (SAP’)(Px) = b, or Ay = b, 
which has the form 





yı bı 
L, An E 
Yr b, 
— |> , 6.1 
Yr+1 br41 a 
On r,r On r,m—r : 
E | A bn 
sisar a 
=yi=Px =b:=Sb 


The left-multiplication of x with P just means a different ordering of the unknowns 
Airosa Am, Thus, the solutions of Ax = b can be easily recovered from the solutions 
of DE = b, and vice versa: We have 7 ye L(A, b) if and only ifx := Ply € 
L (SA, b) = L(A, b). 

The solutions of (6.1) can now be determined using the extended coefficient matrix 


Abek m 


which is obtained by attaching b as an extra column to A. Note that rank(A) < 


rank([A, b)), with equality if and only if brat = = b, = 0. 
If rank(A) < rank([A, b)), then at least one of brat, ..., bn 1S nonzero. Then 
(6.1) cannot have a solution, since all entries of A i in the rows r + 1, ..., n are zero. 
If, on the other hand, rank(A) = rank([A, b]), then b,4; = --- = bn = O and 
(6.1) can be written as 
Yi bi Yr+1 


l. 


Sa 
~ 
S~ 
x 
< 
` 


76 6 Linear Systems of Equations 
This representation implies, in particular, that 


b” := [b,...,b,,0,...,0]’ € L(A, b) £ Ø. 
— m 


m—r 


From Lemma 6.2 we know that 7 (A, b) = b™ +_Y(A, 0). In order to determine 


a (A, 0) we set bj = — b, = Oin (6.2), which yields 
Y(A,0) = { DA, oes Im]. | Yro1,---> Ym arbitrary and (6.3) 


[i.e He” =O AD o Sal” Y 


If r = m, then Apn = [ ], (A, 0) = {0} and thus |Z (A, b)| = 1, i.e., the solution 
of A i= b is uniquely determined. 


Example 6.4 For the extended coefficient matrix 


o 1 0 3|b; 
[A,b] = | 014b, | € Q4 
0 0 0|b3 


we have rank(A) = = rank ([A, b]) if and only ifb; = ==), If b; Æ 0, then U b) = Ø. 
If b3 = 0, then Ay = = b can be written as 


yıl bı 13 
al-l- ao 
Hence, b® = [b], by, OJ € Z(A, b) and 


L(A, 0) = { 91. 92. BI | Js arbitrary and [F1, P] = —[3, 41" [P3] }. 


Summarizing our considerations we have the following algorithm for solving a 
linear system of equations. 


Algorithm 6.5 Let A € K™™” and b € K™! be given. 


(1) Determine S € GL,,(K) such that SA is in echelon form and define b := Sb. 
(2a) If rank(SA) < rank([SA, b]), then “(SA, b) = L(A, b) = Ø. 
(2b) Ifr = rank(A) = rank([SA, b]), then define A := SAP? as in (6.1). 


We have b™ e (A, b) and L(A, b) = b™ + L(A, 0), where L(A, 0) is 
determined as in (6.3), as well as Z (A, b) = {PIF |F € L(A, b)}. 


Since rank(A) = rank(SA) = rank(A) and rank({[A, b]) = rank([SA, b}) = 
rank([A, b]), the discussion above also yields the following result about the different 
cases of the solvability of a linear system of equations. 
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Corollary 6.6 For A € K"” andb € K™! the following assertions hold: 

(1) Ifrank(A) < rank([A, b]), then Z (A, b) = Ø. 

(2) Ifrank(A) = rank ([A, b]) = m, then | L(A, b)| = 1 (i.e., there exists a unique 
solution). 

(3) Ifrank(A) = rank ([A, b]) < m, then there exist many solutions. If the field K 
has infinitely many elements (e.g., when K = Q, K = Ror K = C), then there 
exist infinitely many pairwise distinct solutions. 


The different cases in Corollary 6.6 will be studied again in Example 10.8. 


Example 6.7 Let K = Q and consider the linear system of equations Ax = b with 


(221 l 
0103 0 
A= 1030), B= }2Z 
2354 3 
1133 2 


We form [A, b] and apply the Gaussian elimination algorithm in order to transform 
A into echelon form: 


t 29. ali 122i 
0 10 3l0 0103l0 
[A, b] ~ | 0-21-1]1 | ~ 1001511 
0—11 2/1 0015/1 
0—11 2/1 0015/1 
Loo 1O02—5/1 
0103)0 010 3/0 
~ 100151 | ~ |001 5/1 
0000)0 000 o0l0 
0000)0 000 Ol0 
100 —15|-1 
010 3| 0 
~ 1001 5| 1| = [SAID]. 
000 o| 0 
000 o| 0 


Here rank(SA) = rank([SA, b}) = 3, and hence there exist solutions. The pivot 
columns are ji = i for i = 1,2,3, so that P = PT = i, and A = SA. Now 
SAx = b can be written as 


X1 —1 —15 


X2 | = 0 — 3 [x4]. 
X3 1 5 
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Consequently, b® = [—1,0,1,0]7 € YA, b) and L(A, b) = b + Y(A,0), 
where 


L(A,0) = { [ei,...,%4]" | X4 arbitrary and [%), %2, %3]" = —[—15, 3, 51° [x4] }. 


Exercises 


6.1 Finda field K and matrices A € K””",S € K™” andb €e K™! with Z(A, b) Æ 
L (SA, Sb). 
6.2 Determine -X (A, b) for the following A and b: 


1 1 1 1 
A=|1 2-1]eR**, b=] —2|eR!, 
1-1 6 3 
1 1 1 0 1 
A=|1 2 -1 —-1|eR*, b= | -2 | e Rt, 
1 —-1l 6 2 3 
1 1 i 1 
o ļ|l1 2 -1 4.3 p= 4,1 
A= I 6 ER’, b= 3 Ee R”, 
1 1 1 1 
1 1 1 1 
-Jl 2 =1 4,3 =2 4,1 
A= 1-1 6 ER’, b 3 Ee R”. 
1 1 íi 0 
6.3 Leta € Q, 
321 6 
A=/111}eQ°, ba= |3| cQ. 
210 Q 


Determine (A, 0) and Z (A, ba) in dependence of a. 

6.4 Let A €e K”” and B € K””. Fori = 1,...,s denote by b; the ith column of 
B. Show that the linear system of equations AX = B has at least one solution 
X € K” if and only if 


rank(A) = rank ([A, b;]) = rank ([A, b2]) = --- = rank ([A, b;]). 


Find conditions under which this solution is unique. 
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6.5 Let 
0 6 


A= oe ek™, b= | : lek 


a, O 


be given with @;, a; Æ O for all i. Determine a recursive formula for the entries 
of the solution of the linear system Ax = b. 


Chapter 7 
Determinants of Matrices 


The determinant is a map that assigns to every square matrix A € R™”, where R is 
a commutative ring with unit, an element of R. This map has very interesting and 
important properties. For instance it yields a necessary and sufficient condition for 
the invertibility of A € R””. Moreover, it forms the basis for the definition of the 
characteristic polynomial of a matrix in Chap. 8. 


7.1 Definition of the Determinant 


There are several different approaches to define the determinant of a matrix. We use 
the constructive approach via permutations. 


Definition 7.1 Letn € N be given. A bijective map 
o : {1,2,...,n}— {1,2,...,n}, Fro), 


is called a permutation of the numbers {1, 2, ..., n}. We denote the set of all these 
maps by Sn. 


A permutation o € S, can be written in the form 
[o(1) o(2)... o(n)]. 
For example Sı = {[1]}, So = {[1 2], [2 1]}, and 
So =I 25h L322 13.(2351), (3121321), 
From Lemma 2.17 we know that |S | =n! =1-2-...-n. 
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The set S, with the composition of maps “o” forms a group (cp. Exercise 3.3), 
which is sometimes called the symmetric group. The neutral element in this group is 
the permutation [12 ... n]. 
While Sı and S2 are commutative groups, the group S, forn > 3 is non- 
commutative. As an example consider n = 3 and the permutations 0; = [23 1], 
o> = [132]. Then 


01902 = [o;(02(1)) o1(02(2)) o1(023))] = [oD 0103) 1 (@)) = [2 1 3], 
0200; = [o2(a1(1)) o2(01(2)) 02(01(3))] [o2(2) o2(3) o2(1)] [32 1]: 


Definition 7.2 Letn > 2 ando € S,. A pair (o (i), o(/)) with 1 <i < j < n and 
o(i) > o(J) 1s called an inversion of o. If k is the number of inversions of o, then 
sen(a) := (—1)* is called the sign of o. For n = 1 we define sgn({1]) := 1= (—1)°. 


In short, an inversion of a permutation ø is a pair that is “out of order”. The term 
inversion should not be confused with the inverse map o~! (which exists, since ø is 
bijective). The sign of a permutation is sometimes also called the signature. 


Example 7.3 The permutation [2314] € S4 has the inversions (2, 1) and (3, 1), 
so that sgn([2 3 14]) = 1. The permutation [4 123] € S4 has the inversions (4, 1), 
(4, 2), (4, 3), so that sgn([4 123]) = —1. 


We can now define the determinant map. 


Definition 7.4 Let R be a commutative ring with unit and let n € N. The map 


det : R” > R, A= l[ajj] > det(A) := > sgn(o) I] Aa (7.1) 


TES) i=] 


is called the determinant, and the ring element det(A) is called the determinant of A. 


The formula (7.1) for det(A) is called the signature formula of Leibniz.! The term 
sgn(c) in this definition is to be interpreted as an element of the ring R, 1.e., either 
sen(o) = 1 € Rorsgn(a) = —1 € R, where —1 € R is the unique additive inverse 
of the unit 1 € R. 


Example 7.5 Forn = 1 we have A = [aj] and thus det(A) = sgn([1 ai] = a11. 
For n = 2 we get 


det(A) = det (fe 2) = sen({1 2])aj;ax + sgn ([2 1))ay2a2 


a2) a22 


= 411422 — 412421. 


| Gottfried Wilhelm Leibniz (1646-1716). 


7.1 Definition of the Determinant 83 
For n = 3 we have the Sarrus rule?: 


det(A) = @1 1422433 + 412423431 + 413421432 


—@ 11423032 — 412421433 — 413422431. 


In order to compute det(A) using the signature formula of Leibniz we have to 
form n! products with n factors each. For large n this is too costly even on mod- 
ern computers. As we will see in Corollary 7.16, there are more efficient ways for 
computing det(A). The signature formula is mostly of theoretical relevance, since it 
represents the determinant of A explicitly in terms of the entries of A. Considering 
the n? entries as variables, we can interpret det(A) as a polynomial in these variables. 
If R = Ror R = C, then standard techniques of Analysis show that det(A) is a 
continuous function of the entries of A. 

We will now study the group of permutations in more detail. The permutation 
o = [321] € Ss has the inversions (3, 2), (3, 1) and (2, 1), so that sgn(o) = —1. 
Moreover, 


= Jal m=i J= 


E 4 ag E 
Z= = a cm M = = = n ; 
2-13-13-2 nee 


TT o=o) _ o(2)— aC) a3) — a) 03) — 0) 


Į<i<j<3 


This observation can be generalized as follows. 


Lemma 7.6 For each co € S, we have 


sen(o)= |] ae (7.2) 


l<i<j<n 


Proof Ifn = 1, then the left hand side of (7.2) is an empty product, which is defined 
to be 1 (cp. Sect. 3.2), so that (7.2) holds for n = 1. 

Letn > l ando e S, with sgn(o) = (—1)*, i.e., k is the number of pairs 
(a(i), o(J)) withi < j but o(i) > o(j). Then 


II e-e [] leD-e@l=C* YT] G-d. 


I<t<jan Sie jen l<i<j<n 


In the last equation we have used the fact that the two products have the same factors 
(except possibly for their order). o 


Pierre Frédéric Sarrus (1798-1861). 
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Theorem 7.7 For all o1, o2 E€ S, we have sgn(a; o o2) = sgn(o 1) sgn(a2). In 
particular, sgn(a~') = sgn(o) for all o € Sy. 


Proof By Lemma7.6 we have 


Coe 


sen(a, 0 02) jai 


l<i<j<n 


Mo a eee 
o2(j) — a(i) jai 


l<i<j<n l<i<j<n 


o1(02(J)) — oilo (i)) 
o2(J) — a(i) 


sgn(o2) 

1<o2(i)<o2(j)<n 

TT oilj) — a(i) 
j—i 


sgn(o2) 


l<i<j<n 


= sgn(a1) sgn(o2). 


For each o € S, we have 1 = sgn([12 ... n]) = sgn(a o o™ t) = sgn(c) sgn(a~!), 
so that sgn(c) = sgn(a7!). o 


Theorem 7.7 shows that the map sgn is a homomorphism between the groups 
(S,,0) and ({1, —1}, -), where the operation in the second group is the standard 
multiplication of the integers 1 and —1. 


Definition 7.8 A transposition is a permutation T € $a, n > 2, that exchanges 
exactly two distinct elements k, € € {1,2,...,n}, 1.e., Tk) = £, T(£) = k and 
T(J) = j for all j € {1,2,...,n}\ {k, £}. 


1 


Obviously T° = 7 for every transposition T € S}. 


Lemma 7.9 Let 7 € S, be the transposition, that exchanges k and £ for some 
1<k < € <n. ThenT has exactly 2( —k) —1 inversions and, hence, sgn(T) = —1. 


Proof We have £ = k + j fora j > 1 and thus 7 is given by 
T=[l,...,k-1, k+j, k+1,...,k+(j—1), k, €4+1,...,n], 


where the points denote values of 7 in increasing and thus “correct” order. A simple 
counting argument shows that 7 has exactly 27 — 1 = 2(€ — k) — 1 inversions. O 


7.2 Properties of the Determinant 85 


7.2 Properties of the Determinant 
In this section we prove important properties of the determinant map. 


Lemma 7.10 For A € R”” the following assertions hold: 


(1) ForX € R, 
A |x _ AlO ia = 
det (| =) = det (|= A \) = \det(A). 


(2) If A = [a;;] is upper or lower triangular, then det(A) = Els Ajj. 

(3) If A has a zero row or column, then det(A) = 0. 

(4) Ifn > 2 and A has two equal rows or two equal columns, then det(A) = 0. 
(5) det(A) = det(A’). 


Proof 


(1) Exercise. 

(2) This follows by an application of (1) to the upper (or lower) triangular matrix A. 

(3) If A has a zero row or column, then for every o € S, at least one factor in the 
product []}_, aioa is equal to zero and thus det(A) = 0. 

(4) Let the rows k and £, with k < £, of A = [a;;] be equal, i.e., a,j; = ag; for 
jJ=1,...,n.LetT € S, be the transposition that exchanges the elements k and 
£, and let 

la = lo © Sel olk) = oli. 


Since the set T, contains all permutations o € S, for which o(k) < o(£), we 
have |7;,| = |S,|/2 and 
Se ly =10 07 | 0 © 7,1. 


Moreover, 
dio), i £k,£, 
di (øor)(i) = | ako(@)» i= k, 
Al o(k)s (=. 


We have ako = aeo and aeo) = ak olk) Thus, using Theorem7.7 and 
Lemma7.9, we obtain 


n n 
> semo) | | aico = X seor) || acna 
i=l 


oESn\ Tn oc}, i= 
n 

= >) (sgn(o)) I] i,(cor)(i) 
ocT, i=l 


n 


= — > sgn(o) | | aiso. 


ocT, i=l 
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This implies 
det(A) = $ sgn(o) j di a(i) 
oESn 
= aD. sgn(o) j di o(i) zy >. sgn(o) j đi o(i) = 0. 
oeT, o€S,\Th, 


The proof for the case of two equal columns is analogous. 
(5) We observe first that 


{(o(i),i) |1<i<n}={Go'@)|1sis<n} 


for every o € S,. To see this, leti with 1 < i < n be fixed. Then o(i) = j if and 
only if i = o`! (j). Thus, (c(i), i) = (j,i) is an element of the first set if and 
only if (j, 7 '(j)) = (j, i) is an element of the second set. Since ø is bijective, 
the two sets are equal. 

Let A = [aij] and AS = [b;;] with bij = ajj- Then 


det(A’) = b2 sgn(o) j Doan =. sen(c) I Ag(i),i 
o€S,, o€S,, 
= > sgn(o ') j đo(i) i = > sgn(o ') j Aiai) 
o€S, o€S,, 
= > sen) j di,o(i) = det(A). 
JES), 


Here we have used that sgn(a) = sgn(a~!) (cp. Theorem 7.7) and the fact that 
the two products []7_, doq),; and []}_, aj,.-1@) have the same factors. E 


Example 7.11 For the matrices 


123 120 112 
A=|045|, B=ą|130]|, C=1|113 
006 140 114 


from Z?’ we obtain det(A) = 1-4-6 = 24 by (2) in Lemma7.10, and det(B) = 
det(C) = 0 by (3) and (4) in Lemma 7.10. We may also compute these determinants 
using the Sarrus rule from Example 7.5. 


Item (2) in Lemma7.10 shows in particular that det(/,) = 1 for the identity 
matrix I, = [e1, €&2,..., €n] E R™”. For this reason the determinant map is called 
normalized. 
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For o € S, the matrix 


Pe = Cos EO sean Com | 


is called the permutation matrix associated with ø. This map from the group S$, to 
the group of permutation matrices in R™” is bijective. The inverse of a permutation 
matrix is its transpose (cp. Theorem 4.16) and we can easily check that 


P sp apa. 
If A = [a1, @, ... , an] € R"",ie., a; € R™' is the jth column of A, then 


AP = [iire k 


1.e., the right-multiplication of A with P, exchanges the columns of A according to 
the permutation c. If, on the other hand, a; € R!:" is the ith row of A, then 


do(1) 
do(2) 
PIA=| ], 


oO 


Ag (n) 


i.e., the left-multiplication of A by P? exchanges the rows of A according to the 
permutation o. 
We next study the determinants of the elementary matrices. 


Lemma 7.12 (/) Foro € S, and the associated permutation matrix P, € R™” we 
have sgn(a) = det(P,). [fn > 2 and P;; is defined as in (5.1), then det(P;;) = 
- 1. 

(2) If M;(\) and G;i;(A) are defined as in (5.2) and (5.3), respectively, then 
det(M;(A)) = à and det(Gj;(A)) = 1. 


Proof 


(1) Ifo € S, and Pe = [a;;] € R””, then azçq),; = 1 for j = 1,2,...,n, and all 
other entries of Pz are zero. Hence 


n n 
det( Pz) = det(PŻ ) = > sgn(o) I] Ag(j),j = Sgn(c) I] ag(j),j = Sgn(c). 
oESp j=l j= 
— m 
=0 for 040 


The permutation matrix P;; is associated with the transposition that exchanges 
i and j. Hence, det(P;;) = —1 follows from Lemma7.9. 

(2) Since M; (A) and Gj; (A) are lower triangular matrices, the assertion follows from 
(2) in Lemma7.10. o 
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These results lead to some important computational rules for determinants. 


Lemma 7.13 For A € R””, n > 2, and ÀA E R the following assertions hold: 


(1) The multiplication of a row of A by X leads to the multiplication of det(A) by A: 
det(M;(A)A) = Adet(A) = det(M; (A)) det(A). 

(2) The addition of the \—multiple of a row of A to another row of A does not change 
det(A): 
det(Gi;(A) A) = det(A) = det(G;;(A)) det(A), and 
det(G;; (A)! A) = det(A) = det(G;;(A)") det(A). 

(3) Exchanging two rows of A changes the sign of det(A): 
det (P;; A) = — det(A) = det(P;;) det A. 


Proof 
(1) IfA = [amg] and A = M; (XA = [ăn], then 


p Onis WEP I, 
mk = 


Ng. mM =i, 


and hence 


n n 
det(A) = > sgn(o) lI Gc = b2 sgn(o) diot) I] m,o(m) 


oes oES = 
ý ú =\4i c(i) m#i =Am,o(m) 


= Adet(A). 
(2) If A = [amk] and A = Gi (AJA = [Gime], then 


a. Amk> m A]; 
mk = . 
jk + AGix, M= J, 


and hence 


det(A) = > sen(c) (Aj o(j) T AGi,o(j)) I] Am,o(m) 


aéeS, bel 
m#j 
ñ n 
= > sgn(o) I] Am,o(m) + A > Sgn (o )di o(j) IEZA 
a€§,, m=1 TES, ene 


m#j 


The first term is equal to det(A), and the second is equal to the determinant of a 
matrix with two equal columns, and thus equal to zero. The proof for the matrix 
Gi; (A) A is analogous. 
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(3) The permutation matrix P;; exchanges rows i and j of A, where i < j. This 
exchange can be expressed by the following four elementary row operations: 
Multiply row j by —1; add row i to row j; add the (—1)—multiple of row j to 
row 1; add row to row j. Therefore, 


P;; = Gij)(Gij(—1))' Gj) Mj (1). 


(One may verify this also by carrying out the matrix multiplications.) Using (1) 
and (2) we obtain 
det(P;; A) = det (GiG Gi;(DM;j(-DA) 
= det(G;;(1)) det((G;;(—1))") det(G;;(1)) det(M;(—1)) det(A) 
= (—1) det(A). oO 


Since det(A) = det(A?) (cp. (5) in Lemma7.10), the results in Lemma 7.13 for 
the rows of A can be formulated analogously for the columns of A. 


Example 7.14 Consider the matrices 


130 310 
A=]|120|, B=|210| € Z”. 
124 214 
A simple calculation shows that det(A) = —4. Since B is obtained from A by 
exchanging the first two columns we have det(B) = — det(A) = 4. 


The determinant map can be interpreted as a map of (R™!)” to R, i.e., as a map of 
the n columns of the matrix A € R"" to the ring R. If a;,a; € R™! are two columns 
of A, 


then 
det(A) = — det (|. --A7...Qj.. |) 


by (3) in Lemma7.13. Due to this property the determinant map is called an alter- 
nating map of the columns of A. Analogously, the determinant map is an alternating 
map of the rows of A. 

If the kth row of A has the form Aa“ + pa® for some A, u € R anda? = 


fact = wae e RI”, j = 1, 2, then 
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: n 
1 2 
det(A) = det | | a + pa | | = X seno) (Aak +400) [I aio 
: oESy i=l 
it¢k 
DoT y r 
= >) sen(o) ati [] akow + HD, santo) ak otk) TEZO 
TESn a oESp = 


fife 
Led SA 


This property is called the linearity of the determinant map with respect to the rows 
of A. Analogously we have the linearity with respect to the columns of A. Linear 
maps will be studied in detail in later chapters. 

The next result is called the multiplication theorem for determinants. 


Theorem 7.15 Jf K is a field and A, B € K””, then det(AB) = det(A) det(B). 
Moreover, if A is invertible, then det(A~') = (det(A))~!. 


Proof By Theorem 5.2 we know that for A e K™” there exist invertible elementary 
matrices S,,..., S, such that A= S,...5,A is in echelon form. By Lemma7.13 we 
have 

det(A) = det(S;') --- det(S>') det(A), 


as well as 


det(AB) = det (S;'---S7'AB) 
= det(S,') --- det(S7!) det(AB). 


There are two cases: If A is not invertible, then A and thus also AB have a zero 
row. Then det(A) = det(A B) = 0, which implies that det(A) = 0, and hence 
det(A B) = 0 = det(A) det(B). On the other hand, if A is invertible, then A= = 
since A is in echelon form. Now det(/,) = 1 again gives det(A B) = det(A) det(B). 

Finally, if A is invertible, then 1 = det(/,) = det(AA~!) = det(A) det(A~), 
and hence det(A~!) = (det(A))7!. o 


Since our proof relies on Theorem 5.2, which is valid for matrices over a field 
K , we have formulated Theorem 7.15 for A, B € K™”. However, the multiplication 
theorem for determinants also holds for matrices over a commutative ring R with unit. 
A direct proof based on the signature formula of Leibniz can be found, for example, 
in the book “Advanced Linear Algebra” by Loehr [Loe14, Sect. 5.13]. That book 
also contains a proof of the Cauchy-Binet formula for det(A B) with A € R”” and 
B €e R”” forn < m. Below we will sometimes use that det(A B) = det(A) det(B) 
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holds for all A, B € R””, although we have shown the result in Theorem 7.15 only 
for A, Be K””. 

The proof of Theorem 7.15 suggests that det(A) can be easily computed while 
transforming A € K™” into its echelon form using elementary row operations. 


Corollary 7.16 For A € K”” let Si,..., S E K™” be elementary matrices, such 
that A = S,...S,A is in echelon form. Then either A has a zero row and hence 
det(A) = 0, or A = I, and hence det(A) = (det(S,))~! --- (det(S,))7!. 


As shown in Theorem 5.4, every matrix A € K™” can be factorizedas A = PLU, 
and hence det(A) = det(P) det(L) det(U). The determinants of the matrices on 
the right hand side are easily computed, since these are permutation and triangular 
matrices. An LU-decomposition of a matrix A therefore yields an efficient way to 
compute det(A). 


MATLAB-Minute. 

Look at the matrices wilkinson (n) forn=2,3,...,10in MATLAB. Can you 
find a general formula for their entries? For n=2,3,..., 10 compute 
A=wilkinson (n) 

[L,U,P]=lu(A) (LU-decomposition; cp. the MATLAB-Minute above Defi- 
nition 5.6) 

det(L), det(U), det(P), det(P)xdet(L)xdet(U), det (A) 

Which permutation is associated with the computed matrix P? Why is det (A) 
an integer for odd n? 


7.3 Minors and the Laplace Expansion 


We now show that the determinant can be used for deriving formulas for the inverse 
of an invertible matrix and for the solution of linear systems of equations. These 
formulas are, however, more of theoretical than practical relevance. 


Definition 7.17 Let R be a commutative ring with unit and let A € R””,n > 2. 
Then the matrix A(j,i) € R”~'"~! that is obtained by deleting the jth row and ith 
column of A is called a minor? of A. The matrix 

adj(A) = [bj] € R” with bij := (—1)'*/ det(AG, i)), 
is called the adjunct of A. 


The adjunct is also called adjungate or classical adjoint of A. 


3This term was introduced in 1850 by James Joseph Sylvester (1814-1897). 
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Theorem 7.18 For A € R"”",n > 2, we have 
Aadj(A) = adj(A) A = det(A)I,. 


In particular A is invertible if and only if det(A) € R is invertible. In this case 
(det(A))~! = det(A~!) and A~! = (det(A))~!adj(A). 


Proof Let B = [b;;] have the entries b;; = (—1)'*/ det(A(j, i)). Then C = [c;;] = 
adj(A)A satisfies 


cij = > Dinas = X (-1'** det(A(k, i))agy. 
k=1 k=1 
Let a, be the £th column of A and let 
A(k, i) := [a1, .. - , Qj-1, €k, Git ty -< an] © R””, 


where ex is the kth column of the identity matrix /,,. Then there exist permutation 
matrices P and Q that perform k — 1 row andi — 1 column exchanges, respectively, 


such that 


Using (1) in Lemma7.10 we obtain 


1 * oe 
det(A(k, i)) = det (|; IRD \) = det(PA(k,i)Q) 


= det(P) det(A(k, i)) det(Q) 
= (-D) PE det(A(k, i) 
= (—1)*"' det(A(k, i)). 


The linearity of the determinant with respect to the columns now gives 


cij = D-DD agg det(A(k, i)) 


k=1 
= dAl eaei Gig Gi aiy asig nl) 
-{3 ix j 
det(A), i= j 
= ĝ;; det(A), 


and thus adj(A)A = det(A)/,. Analogously we can show that A adj(A) = det(A) Z. 
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If det(A) € R is invertible, then 
I, = (det(A))~!adj(A)A = A(det(A))7~!adj(A), 


i.e., A is invertible with A~! = (det(A))~!adj(A). If, on the other hand, A is invert- 
ible, then 


1 = det(J,,) = det(A AT!) = det(A) det(A~!) = det(A!) det(A), 


where we have used the multiplication theorem for determinants over R (cp. our 
comment following the proof of Theorem7.15). Thus, det(A) is invertible with 
(det(A))~! = det(A~!), and again A~! = (det(A))~!adj(A). o 


Example 7.19 
(1) For 


o |41 22 
a=|3;/ez 


we have det(A) = 2 and thus A is not invertible. But A is invertible when 
considered as an element of Q””, since in this case det(A~!) = (det(A))~! = 7 
(2) For 


a=] ; Ti] e 


we have det(A) = 1. The matrix A is invertible, since 1 € Z[t] is invertible. 


Note that if A € R”” is invertible, then Theorem 7.18 shows that A~! can be 
obtained by inverting only one ring element, det(A). 

We now use Theorem7.18 and the multiplication theorem for matrices over a 
commutative ring with unit to prove a result already announced in Sect. 4.2: In order 
to show that A € R™” is the (unique) inverse of A € R””, only one of the two 
equations AA = i or AA = I, needs to be checked. 


Corollary 7.20 LetA € R”". Ifa matrix A € R™” exists with AA = [, or AA = L; 
then A is invertible and A = A7!. 


Proof It AA = I, then the multiplication theorem for determinants yields 
1 = det(I,) = det(AA) = det(A) det(A) = det(A) det(A), 


i.e., det(A) € R is invertible with (det(A))~! = det(A). Thus also A is invertible 
and has a unique inverse A~!. Forn = 1 this is obvious and for n > 2 it was shown 
in Theorem 7.18. If we multiply the equation AA = I, from the right with A~! we 
get A = A7! 

The proof starting from AA = I, is analogous. oO 
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Let us summarize the invertibility criteria for a square matrix over a field that we 
have shown so far: 


A € GL,(K) ‘28° The echelon form of A is the identity matrix J, 
Definition 5.10 nk (A) = n 
Æ — rank(A) = rank([A, b]) =n forall b € K™! 
Meee °° | Y(A, b)| = 1 forall b e K”! 
Theyem 7.18 det(A) £ 0. o 


Alternatively we obtain: 


A £ GL,(K) po The echelon form of A has at least one zero row 


Definition 5.10 
<> rank(A)<n 


clear 


<> rank([A, 0) <n 
Meee? rA Oy AKO) 


Theorem. 7.18 det(A) 0) (7.4) 


In the fields Q, R and C we have the (usual) absolute value | - | of numbers and 
can formulate the following useful invertibility criterion for matrices. 


Theorem 7.21 Jf A €e K™” with K e {Q, R, C} is diagonally dominant, i.e., if 


n 

lai | > > an| Jorall i =l; =s 
j=l 
if 


then det(A) Æ 0. 


Proof We prove the assertion by contraposition, 1.e., by showing that det(A) = 0 
implies that A is not diagonally dominant. 
If det(A) = 0, then @(A,0) Æ {0}, i.e., the homogeneous linear system of 


equations Ax = 0 has at least one solution © = [X],...,X,]’ Æ 0. Let x,, be an 
entry of x with maximal absolute value, i.e., Xm] > |[x;| for all j = 1,...,n. In 


particular, we then have |x,,| > 0. The mth row of Ax = 0 is given by 


n 

ve 
We now take absolute values on both sides and use the triangle inequality, which 
yields 
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n n n 

amm| ml < A al 1 < S earla hence |amml < 2 layl 
j=l j=l j=l 
jam j#m jm 


so that A not diagonally dominant. o 


The converse of this theorem does not hold: For example, the matrix 


12 - 
a=|10] 2 


has det(A) = —2 4 0, but A is not diagonally dominant. 
From Theorem 7.18 we obtain the Laplace expansion‘ of the determinant, which 
is particularly useful when A contains many zero entries (cp. Example 7.24 below). 


Corollary 7.22 For A € R””, n > 2, the following assertions hold: 


(1) For eachi =1,2,...,n we have 


det(A) = > (-1)'*/a;; det(AG@, j)). 


j=l 


(Laplace expansion of det(A) with respect to the ith row A.) 
(2) For each j = 1,2,...,n we have 


det(A) = > (-1)'*a;; det(AG@, j)). 


| 


(Laplace expansion of det(A) with respect to the jth column of A.) 


Proof The two expansions for det(A) follow immediately by comparison of the 
diagonal entries in the matrix equations det(A) J, = Aadj(A) and det(A) J, = 
adj(A) A. o 


The Laplace expansions allows a recursive definition of the determinant: For A € 
R”™” withn > 2, let det(A) be defined as in (1) or (2) in Corollary 7.22. We can choose 
an arbitrary row or column of A. The formula for det(A) then contains only matrices 
of size (n—1) x (n—1). For each of these we can use the Laplace expansion again, now 
expressing each determinant in terms of determinants of (n — 2) x (n — 2) matrices. 
We can do this recursively until only 1 x 1 matrices remain. For A = [a,;] € RH! 
we define det(A) := a11. 

Finally we state Cramer’s rule,’ which gives an explicit formula for the solution of 
a linear system in form of determinants. This rule is only of theoretical value, because 
in order to compute the n components of the solution it requires the evaluation of 
n + 1 determinants of n x n matrices. 


*Pierre-Simon Laplace (1749-1827) published this expansion in 1772. 
> Gabriel Cramer (1704-1752). 
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Corollary 7.23 Let K be a field, A € GL,(K) and b € K™!. Then the unique 
solution of the linear system of equations Ax = b is given by 


x =([%1,...,%n]’ = A 'b = (det(A)) ~! adj(A)b, 


with 
a MCU y 1165 Aig DOs era gg! 
x; = n, 1 = l, n 
det(A) 
Example 7.24 Consider 
1300 1 
me eee a | ee 
=i = ea ee 
1231 0 


The Laplace expansion with respect to the last column yields 


130 r 
det(A) = 1-det{ |120|) =1-1-det A echo 
121 k 


Thus, A is invertible and Ax = b has a unique solution x = A~!'b € Q*!, which by 
Cramer’s rule has the following entries: 


1300 

7 2200 

1 =det] |75] o| | /det(a) = -4/(-D =4, 
0231 


1100 
1200 
1110 
1031 


1310 
B = det o. / det(A) = 1/(—1) = -1, 
i201 


D = det J det(A) = 1/(—1) = -1, 


1301 
1202 
1211 
1230 


ty = det / det(A) = —1/(—1) = 1. 
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Exercises 
7.1 A permutation o € S, is called an r-cycle if there exists a subset {i,,...,i,} C 
{1,2,...,n} withr > 1 elements and 
Op) = ipaq Tor S 12s ke, r—1, oir) =i, oi) =i for i € {i1,..., ir}. 
We write anr-cycle as o = (i1, i2, ... , i»). In particular, a transposition T € S, 
is a 2-cycle. 


(a) Let n = 4 and the 2-cycles 71,2 = (1, 2), 7,3 = (2, 3) and 73,4 = (3, 4) be 
given. Compute T1,2 O T2,3, 11.9 O T23 © E and T1,2 O T23 O T34. 
(b) Letn > 4 and o = (1, 2, 3, 4). Determine o/ for j = 2, 3, 4, 5. 


(c) Show that the inverse of the cycle (i1, ..., i+) is given by (i;,..., 71). 
(d) Show thattwo cycles with disjoint elements, 1.e. (i1, ..., i )and (J1, ..., Js) 
with {i1,..., i} O {j1 ..-, Js} = Ø, commute. 


(e) Show that every permutation o € S, can be written as product of disjoint 
cycles that are, except for the order, uniquely determined by o. 


7.2 Prove Lemma7.10 (1) using (7.1). 

7.3 Show that the group homomorphism sgn : (Sa, o) —> ({1, —1}, -) satisfies the 
following assertions: 
(a) The set A, = {o € S, |sgn(o) = 1}1is a subgroup of S, (cp. Exercise 3.8). 
(b) For all o € A, anda € S, wehavetoagom ! € A,. 


7.4 Compute the determinants of the following matrices: 


(a) A = [en, en-1, ---, €1] E Z”, where e; is the ith column of the identity 
matrix. 
(b) B = [b;;] ce Z” with 


2 for |i —j| =O, 
b= —] for ji -—js| = 1, 
0 for ji — j| > 2. 


(c) 
10 1 0 0 0 O 
ey e <4 5 I ae 
21 2 Ve V7 v8 Vi0 
C=|e0 -e xr e 0 w | ER” 
e+ 0 10001 0 m! 0 er 
e&0 y2 0 0 0 -=l 
00 1 0 0 0 O 
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(d) The 4 x 4 Wilkinson matrix® (cp. the MATLAB-Minute at the end of 
Sect. 7.2). 


7.5 Construct matrices A, B € IR”” for some n > 2 and with det(A + B) Æ 
det(A) + det(B). 

7.6 Let R be a commutative ring with unit, n > 2 and A € R™”. Show that the 
following assertions hold: 


(a) adja) = In. 

(b) ad; (AB) = adj(B)adj(A), if A and B € R”” are invertible. 
(c) adj(AA) = A”~!adj(A) for all \ € R. 

(d) adj(A’) = adj(A)’. 

(e) det(adj(A)) = (det(A))"~!, if A is invertible. 

(f) adj(adj(A)) = det(A)”7?A. 

(g) adj(A~!) = adj(A)~|, if A is invertible. 


Can one drop the requirement of invertibility in (b) or (e)? 
7.7 Letn > 2 and A = [a;] € R”” with a;i; = TET for some X1,...,%n, 
iTYj 
Y1; ---, Yn € R. Hence, in particular, x; + y; Æ 0 for all i, j. (Such a matrix A 


is called a Cauchy matrix.”) 
(a) Show that 
Wie OG =F) Op yi) 


A) = 
ae [jn aty) 


(b) Use (a) to derive a formula for the determinant of the n x n Hilbert matrix 
(cp. the MATLAB-Minute above Definition 5.6). 


7.8 Let R be a commutative ring with unit. If œi, ..., @n E€ R,n > 2, then 
1 Qj e a 
, l ar: a 
Vr- [aj] = e R”” 
1 Qn ° a 


is called a Vandermonde matrix. 


(a) Show that 
det(V⁄,)= [|| (a; - 4%). 


l<i=j<n 


6James Hardy Wilkinson (1919-1986). 
7 Augustin Louis Cauchy (1789-1857). 
8 Alexandre-Théophile Vandermonde (1735-1796). 
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7.10 


7.11 


T2 


7,13 


(b) Let K bea field and let K [t]<,_; be the set of polynomials in the variable 
t of degree at most n — 1. Show that two polynomials p,q € K[t]<n-1 are 
equal if there exist pairwise distinct (),..., 9, E K with p(G;) = q(G;). 


Show the following assertions: 


(a) Let K be a field with 1 + 1 Æ 0 and let A € K”” with A’ = —A. If n is 
odd, then det(A) = 0. 
(b) If A € GL, (R) with A? = A™!, then det(A) € {1, —1}. 


Let K be a field and 
Bes A11 Aj 
A21 A22 


for some Ai Ee kM, Ai Ee K”, A> Ee KM, A € K""2, Show the 
following assertions: 


(a) If A11 € GL, (K), then det(A) = det(A11) det (A22 — A21 Ay) A12). 
(b) If A22 € GLn, (K), then det(A) = det(A22) det (A11 — A124537 A21). 
(c) If A>, = 0, then det(A) = det(A11) det (A22). 


Can you show this also when the matrices are defined over a commutative ring 
with unit? 
Construct matrices A11, Ajo, A21, A22 E€ R”” forn > 2 with 


det (| 4" a) # det(A11) det(A22) — det(A 12) det (A21). 

A21 A22 

Let A = [aij] € GL, (R) with aj; € Z for i, j = 1,...,n. Show that the 

following assertions hold: 

(a) A eT”, 

(b) AT! €e Z”” if and only if det(A) € {—1, 1}. 

(c) The linear system of equations Ax = b has a unique solution x € Z™! for 
every b € Z™! if and only if det(A) € {—1, 1}. 


Show that G = {A € Z™” | det(A) € {—1, 1} }is a subgroup of GL, (Q). 


Chapter 8 
The Characteristic Polynomial 
and Eigenvalues of Matrices 


We have already characterized matrices using their rank and their determinant. In this 
chapter we use the determinant map in order to assign to every square matrix a unique 
polynomial that is called the characteristic polynomial of the matrix. This polynomial 
contains important information about the matrix. For example, one can read off the 
determinant and thus see whether the matrix is invertible. Even more important are 
the roots of the characteristic polynomial, which are called the eigenvalues of the 
matrix. 


8.1 The Characteristic Polynomial 
and the Cayley-Hamilton Theorem 


Let R be a commutative ring with unit and let R[t] be the corresponding ring of 
polynomials (cp. Example 3.17). For A = [a;;] € R”” we set 


t =d —di2 ewa —din 
—@ t—a 
tl,-A:= ~ a eR”. 
—Qn-1,n 
—dn1 aad —@n n—-1 Í — Ann 


The entries of the matrix tI, — A are elements of the commutative ring with unit 
R[t], where the diagonal entries are polynomials of degree 1, and the other entries 
are constant polynomials. Using Definition 7.4 we can form the determinant of the 
matrix tl, — A, which is an element of R[f]. 
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Definition 8.1 Let R be a commutative ring with unit and A € R™”. Then 
P, := det(t I, — A) e Rt] 
is called the characteristic polynomial of A. 
Example 8.2 Ifn = 1 and A = [a1], then 
P, = det(tl; — A) = det ([t — a11]) = t — a11. 
For n = 2 and 
ia H s 
a21 A22 
we obtain 


ft — j] —aj2 2 
Pa = det = {^ — (ay, + az2)t + (aiian — 42a). 
=), Ít— an 


Using Definition 7.4 we see that the general form of P4 for a matrix A € R™” is 
given by 
Pa = Š sgn(o) [ | (oot — aiso). (8.1) 
1 


oESy i= 
The following lemma presents basic properties of the characteristic polynomial. 


Lemma 8.3 For A € R”” we have P4 = Par and 
Pa = t" — anat! +... +(-1)" tat + (-1)"a0 
with &n—1 = >." aij and ay = det(A). 
Proof Using (5) in Lemma7.10 we obtain 
P, = det(t I, — A) = det((tl, — A)  ) = det(t I, — A‘) = Par. 


Using P4 as in (8.1) we see that 


n 


Pa = [|e — aii) + p2 sgn(o) I] (iot — dio0). 
i=l 


oESy i=l 
oAL n] 
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The first term on the right hand side is of the form 


i=] 


t” — (> ae + (polynomial of degree < n — 2), 


and the second term is a polynomial of degree < n — 2. Thus, &n-1 = J ;—] aii as 
claimed. Moreover, Definition 8.1 yields 


P4(0) = det(—A) = (—1)" det(A), 


so that ag = det(A). Oo 


This lemma shows that the characteristic polynomial of A € R”” always is of 
degree n. The coefficient of t” is 1 € R. Such a polynomial is called monic. The 
coefficient of t”7! is given by the sum of the diagonal entries of A. This quantity is 
called the trace of A, Le., 


trace(A) := >. Aii. 


i=] 


The following lemma shows that for every monic polynomial p € R[t] of degree 
n > | there exists a matrix A € R”” with P4 = p. 


Lemma 8.4 Ifn € N and p = t" + Bait”! +...+ Bo € RIt] then p is the 
characteristic polynomial of the matrix 


0 — Bo 
A= — : eR”, 
a 0 —Bn—2 
1 —Pn—-1 


(Forn = 1 we have A = |—po].) The matrix A is called the companion matrix of p. 


Proof We prove the assertion by induction on n. 

For n = 1 we have p = t + 6o, A = [— 6o] and P4 = det([t + Go]) = p. 

Let the assertion hold for some n > 1. We consider p = t”t! + Gt”? +... + Bo 
and 
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Using the Laplace expansion with respect to the first row (cp. Corollary 7.22) and 
the induction hypothesis we get 


Pa = det (t In+1 = A) 


t Bo 
= det =I 
4 CE 
—l t+ bn 
t By —l t 
ede." * | 4 (—1)"*? . By - det 
| 4 Pri a t 
—1t+46, —] 


= t- (t+ Bat"! +...+ Bi) +(-D"* Bo 
Sa a t 


Example 8.5 The polynomial p = (t — 1)? = t? — 3t? + 3t — 1 € Z[t] has the 
companion matrix 


00 1 
A-|10-3] e Z”. 
01 3 


The identity matrix J; has the characteristic polynomial 
P, = det(t h — b) = (t — 1) = P4. 
Thus, different matrices may have the same characteristic polynomial. 


In Example 3.17 we have seen how to evaluate a polynomial p € R[t] at a scalar 
A € R. Analogously, we can evaluate p at a matrix M €e R™™ (cp. Exercise 4.8). 
For 


p = Pat” + Brit +...4+ Bo € RIN 
we define 
p(M) := BM” + Ba- M"™ +... + bolm € R””, 
where the multiplication on the right hand side is the scalar multiplication of 3; € R 


and MÍ e R™™, j = 0,1,...,n. (Recall that M? = [,,.) Evaluating a given 
polynomial at matrices M € R™” therefore defines a map from R™” to R™™., 
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In particular, using (8.1), the characteristic polynomial P4 of A € R™” satisfies 


P,(M) = > sen(c) I] (ôi ca M — Ai,o(i) Im) for all M e R””. 


TES) i=l 


Note that for M e R”” and P4 = det(t I, — A) the “obvious” equation P4(M) = 
det(M — A) is wrong. By definition, P4(M) € R”” and det(M — A) € R, so that 
the two expressions cannot be the same, even for n = 1. 


The following result is called the Cayley-Hamilton theorem.! 


Theorem 8.6 For every matrix A € R”” and its characteristic polynomial Pa € 
R[t] we have P(A) = 0 e R””. 


Proof For n = 1 we have A = [a;;] and P4 = t — aj, so that P4 (A) = [ay] — 


[a11] = [0]. 
Let now n > 2 and let e; be the ith column of the identity matrix J, € R™”. Then 


Aei = diie} + aer +... +anjen, t=l1,...,n, 


which is equivalent to 


n 

(A — aile; + X (ajile; =0, i=1,...,n. 
j=l 
jżi 


The last n equations can be written as 


A=@Giid, anly == =dyila e] 0 
Hil, A= Maly =" Anala ez 0 i 
= , or Be=0. 

—dindn =Anda e- AS Uinta Cn 0 


Hence B € (RI[AD™” with R[A] := {p(A)| p € R[t]} C R””. The set R[A] forms 
a commutative ring with unit given by the identity matrix J, (cp. Exercise 4.8). Using 
Theorem 7.18 we obtain 

adj(B)B = det(B)I,, 


' Arthur Cayley (1821-1895) showed this theorem in 1858 forn = 2 and claimed that he had verified 
it for n = 3. He did not feel it necessary to give a proof for general n. Sir William Rowan Hamilton 
(1805-1865) proved the theorem for the case n = 4 in 1853 in the context of his investigations of 
quaternions. One of the first proofs for general n was given by Ferdinand Georg Frobenius (1849— 
1917) in 1878. James Joseph Sylvester (1814—1897) coined the name of the theorem in 1884 by 
calling it the “no-little-marvelous Hamilton-Cayley theorem’. 
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where det(B) € R[A] and T, is the identity matrix in (R[A))™”. (This matrix has n 
times the identity matrix J, on its diagonal.) Multiplying this equation from the right 
by £ yields 

adj(B) Be = det(B)1,<, 


which implies that det(B) = 0 € R™”. Finally, using Lemma 8.3 gives 


0 =det(B) = > sgn(o) | | iow — acoil) 


TES, i=l 
= $ sen) | | Gomi — aoe itn) 
TES, i=l 
— Par (A) 
= Pa(A), 
which completes the proof. m 


8.2 Eigenvalues and Eigenvectors 


In this section we present an introduction to the topic of eigenvalues and eigenvectors 
of square matrices over a field K. These concepts will be studied in more detail in 
later chapters. 


Definition 8.7 Let A € K™”.If \ € K and v € K™! \ {0} satisfy Av = Av, then A 
is called an eigenvalue of A and v is called an eigenvector of A corresponding to A. 


While by definition v = 0 can never be an eigenvector of a matrix, A = 0 may be 


an eigenvalue. For example, 
1-1]}1 1 
-i =e 


If v is an eigenvector corresponding to the eigenvalue A of A anda € K \ {0}, then 
av Æ 0 and 


A (av) = a (Av) = a (Av) = àA (av). 


Thus, also œv is an eigenvector of A corresponding to A. 
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Theorem 8.8 For A € K”” the following assertions hold: 


(1) Ais an eigenvalue of A if and only if X is a root of the characteristic polynomial 
of A, ie, Pa(A) =OE K. 

(2) àA = 0 is an eigenvalue of A if and only if det(A) = 0. 

(3) Ais an eigenvalue of A if and only if A is an eigenvalue of A’. 


Proof 


(1) The equation P4(A) = det(A Z, — A) = 0 holds if and only if the matrix AJ, — A 
is not invertible (cp. (7.4)), and this is equivalent to Z (AI, — A,0) Æ {0}. 
This, however, means that there exists a vector x Æ 0 with (AJ, — A)x = 0, or 
Ax = XX. 

(2) By (1), A = 0 is an eigenvalue of A if and only if P,(0) = 0. The assertion now 
follows from P,4(O) = (—1)” det(A) (cp. Lemma 8.3). 

(3) This follows from (1) and P4 = Par (cp. Lemma8.3). Oo 


Whether a matrix A € K™” has eigenvalues or not may depend on the field K 
over which A is considered. 


Example 8.9 The matrix 


— 01 2,2 
a=|_10]€P 


has the characteristic polynomial P4 = t* + 1 € Rit]. This polynomial does not 
have roots, since the equation t? + 1 = 0 has no (real) solutions. If we consider A as 
an element of C””, then P4 € C[t] has the roots i and —i. Then these two complex 
numbers are the eigenvalues of A. 


Item (3) in Theorem8.8 shows that A and A’ have the same eigenvalues. An 
eigenvector of A, however, may not be an eigenvector of A’. 


Example 8.10 The matrix 


33 22 
a=|ti/eR 


has the characteristic polynomial P4 = t* —4t = t- (t —4), and hence its eigenvalues 
are 0 and 4. We have 


al-ij=o L-i] a aijaa- 
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for all A € R. Thus, [1, —1]’ is an eigenvector of A corresponding to the eigen- 
value 0, but it is not an eigenvector of A’. On the other hand, 


1 1 1 —6 1 
T — = 
apaa a = 4] -]=[2] 1-3] 
for all A € R. Thus, [1, —3]’ is an eigenvector of A’ corresponding to the eigen- 


value 0, but it is not an eigenvector of A. 


Theorem 8.8 implies further criteria for the invertibility of A € K™” (cp. (7.3)): 


AéGL,(K) } 0 is not an eigenvalue of A 


<> 0 is not a root of P4. 


Definition 8.11 Two matrices A, B e K”” are called similar, if there exists a matrix 
Z €e GL,(K) with A = ZBZ". 


One can easily show that this defines an equivalence relation on the set K™” (cp. 
the proof following Definition 5.13). 


Theorem 8.12 Jf two matrices A, B € K”” are similar, then P4 = Pp. 
Proof If A= ZB Z`}, then the multiplication theorem for determinants yields 
Ps = det(t I, — A) = det(tl, — ZBZ~') = det(Z (tI, — B)Z~') 
= det(Z) det(tI, — B) det(Z~') = det(t I, — B) det(ZZ™') 
(cp. the remarks below Theorem 7.15). Oo 


Theorem 8.12 and (1) in Theorem 8.8 show that two similar matrices have the same 
eigenvalues. The condition that A and B are similar is sufficient, but not necessary 
for P A = P B. 


Example 8.13 Let 


11 10 
a= B=|55|=% 


Then P4 = (t — 1)* = Pz, but for every matrix Z € GL,(K) we have ZBZ~! = 
I, # A. Thus, we have P, = Pz although A and B are not similar (cp. also 
Example 8.5). 
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MATLAB-Minute. 

The roots of a polynomial p = ant” + a,_)t?—! +... + a9 can be computed 
(or approximated) in MATLAB using the command roots(p), where p is a 
1 x (n+ 1) matrix with the entries p(i)= a,41_; fori = 1,...,n+1. Compute 
roots(p) for the monic polynomial p = t? — 3t? + 3t — 1 € Rir] and display 
the output using format long. What are the exact roots of p and how large 
is the numerical error in the computation of the roots using roots (p)? 

Form the matrix A=compan(p) and compare its structure with the one of the 
companion matrix from Lemma 8.4. Can you transfer the proof of Lemma 8.4 
to the structure of the matrix A? 

Compute the eigenvalues of A with the command eig(A) and compare the 
output with the one of roots (p). What do you observe? 


8.3 Eigenvectors of Stochastic Matrices 


We now consider the eigenvalue problem presented in Sect. 1.1 in the context of 
the PageRank algorithm. The mathematical modeling leads to the equations (1.1), 
which can be written in the form Ax = x. Here A = [aj;] € R™” (n is the number 
of documents) satisfies 


ai; = 0 and a for T= heh 
i=] 


Such a matrix A is called column-stochastic. Note that A is column-stochastic if 
and only if A’ is row-stochastic. Such matrices also occurred in the car insurance 
application considered in Sect. 1.2 and Example4.7. We want to determine x = 


[x1,...,Xn]/ € R™!\ {0} with Ax = x, where the entry x; describes the importance 
of document i. The importance values should be nonnegative, 1.e., x; > O fori = 
1,...,n. Thus, we want to determine an entrywise nonnegative eigenvector of A 


corresponding to the eigenvalue À = 1. 
We first check whether this problem has a solution, and then study whether the 
solution is unique. Our presentation is based on the article [BryL06]. 


Lemma 8.14 A column-stochastic matrix A € R” has an eigenvector correspond- 
ing to the eigenvalue 1. 


Proof Since A is column-stochastic, we have A’ [1,..., 1]’ = [1,..., 1]’,so that 1 
is an eigenvalue of A’ . Now (3) in Theorem 8.8 shows that also A has the eigenvalue 
1, and hence there exists a corresponding eigenvector. Oo 
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A matrix with real entries is called positive, if all its entries are positive. 


Lemma 8.15 Jf A € R”” is positive and column-stochastic and if x € R™! is an 
eigenvector of A corresponding to the eigenvalue I, then either x or —x is positive. 


Proof If x = [x,...,X,]’ is an eigenvector of A = [a;;] corresponding to the 
eigenvalue 1, then 


Suppose that not all entries of x are positive or not all entries of x are negative. Then 
there exists at least one index k with 


n n 
el= | Š aw x| < Š aw byl, 


which implies 


` bal Sa lx; 255 a xjl = D(a) = or 


i=l] j=l j=l i=l =1 i=l 


This is impossible, so that indeed x or —x must be positive. Oo 
We can now prove the following uniqueness result. 


Theorem 8.16 Jf A €e R”” is positive and column-stochastic, then there exists a 
unique positive x = [x1,...,Xn]’ € R™! with >t =] Land Ax =x. 


Proof By Lemma8.15, A has a least one positive eigenvector corresponding to the 


eigenvalue 1. Suppose that x“) = Bae - xy" and x” = Ek .. o xO 
are two such eigenvectors. Suppose that these are normalized by >*"_, a = 1], 
j = 1,2. This assumption can be made without loss of generality, since every 
nonzero multiple of an eigenvector is still an eigenvector. 

We will show that x“ = x®. For a € R we define x(a) := x“? + ax® e R™!, 
then 


Ax(a) = Ax” + aAx®” =x + ax®™ = x(a). 
If & = = a.: then the first entry of x(&) is equal to zero and thus, by 
Lemma 8.15, x(&) cannot be an eigenvector of A corresponding to the eigenvalue 1. 


Now Ax(q@) = x(&) implies that x(&) = 0, and hence 


xP 44x =0, i=1,...,n. (8.2) 
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Summing up these n equations yields 


n n 
DSO 
Dw +a u =0, 

i=l i=l 


—— —— 


so that a = —1. From (8.2) we get a = ya fori = 1,...,n, and therefore 
xD = xO, oO 


The unique positive eigenvector x in Theorem 8.16 is called the Perron eigenvec- 
tor’ of the positive matrix A. The theory of eigenvalues and eigenvectors of positive 
(or more general nonnegative) matrices is an important area of Matrix Theory, since 
these matrices arise in many applications. 


By construction, the matrix A € R™” in the PageRank algorithm is column- 
stochastic but not positive, since there are (usually many) entries a;; = 0. In order 
to obtain a uniquely solvable problem one can use the following trick: 

Let S = [s;;] € RY” with s;; = 1/n. Obviously, S is positive and column- 
stochastic. For a real number a e€ (0, 1] we define the matrix 


Ala) := (1-a)A + as. 


This matrix is positive and column-stochastic, and hence it has a unique positive 
eigenvector u corresponding to the eigenvalue 1. We thus have 


72 an eo eS ee. 
n 


For a very large number of documents (e.g. the entire internet) the number a/n is 
very small, so that (1 — a) Au ~ u. Therefore a solution of the eigenvalue problem 
A (am = U for small a potentially gives a good approximation of a u € R»! that 
satisfies Au = u. The practical solution of the eigenvalue problem with the matrix 
A (œ) is a topic of the field of Numerical Linear Algebra. 

The matrix S represents a link structure where all document are mutually linked 
and thus all documents are equally important. The matrix A (a) = (l-aA+as 
therefore models the following internet “surfing behavior”: A user follows a proposed 
link with the probability 1 — a and an arbitrary link with the probability a. Originally, 
Google Inc. used the value a = 0.15. 


2 Oskar Perron (1880-1975). 
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Exercises 
(In the following exercises K is an arbitrary field.) 


8.1 Determine the characteristic polynomials of the following matrices over Q: 


20-1 

20 44 21 
ea e 8=|_to|- C=[o2;) 2=| 02 2 
02 -10 02 Gas 


Verify the Cayley-Hamilton theorem in each case by direct computation. Are 
two of the matrices A, B, C similar? 
8.2 Let R be a commutative ring with unit and n > 2. 


(a) Show that for every A € GL,(R) there exists a polynomial p € R[t] of 
degree at most n — 1 with adj(A) = p(A). Conclude that A~! = g(A) 
holds for a polynomial q € R[t] of degree at most n — 1. 

(b) Let A e R””. Apply Theorem7.18 to the matrix tI, — A €e (R[t])”" 
and derive an alternative proof of the Cayley-Hamilton theorem from the 
formula det(t I, — A) I, = (tl, — A) adj(t I, — A). 


8.3 Let A € K””" be a matrix with A‘ = 0 for some k e N. (Such a matrix is 
called nilpotent.) 


(a) Show that A = 0 is the only eigenvalue of A. 
(b) Determine P, and show that A” = 0. 
(Hint: You may assume that P4 has the form || (t—;) for some \1,..., An 
i=l 
E€ K.) 
(c) Show that J, — A is invertible if and only if u € K \ {0}. 
(d) Show that (L-A "= hE AA Fe EAL 


8.4 Determine the eigenvalues and corresponding eigenvectors of the following 
matrices over R: 


111 3 
A=]|011|, B=| 0 
0 


0 0 
0 0 
2 1 
001 E 


0 —1 
1 0 
0 0 
0 O0 


Is there any difference when you consider A, B, C as matrices over C? 
8.5 Letn > 3 and £ € R. Consider the matrix 


A(é) = 
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8.6 


8.7 


8.8 


8.9 


8.10 


as an element of C™” and determine all eigenvalues in dependence of £. How 
many pairwise distinct eigenvalues does A (£) have? 
Determine the eigenvalues and corresponding eigenvectors of 
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A=|]0 4-a 2-a |E€eR?, B=]|101] €(Z/2Z)*°. 
0—-4+4+2a —2+42a 011 


(For simplicity, the elements of Z/2Z are here denoted by k instead of [k].) 
Let A e K””, B e K™”,n > m, and C e K™” with rank(C) = m and 
AC = CB. Show that then every eigenvalue of B is an eigenvalue of A. 
Show the following assertions: 


(a) trace(AA + uB) = A trace(A) + u trace(B) holds for all A, uw € K and 
A, B e K””. 

(b) trace(AB) = trace(B A) holds for all A, B e K””. 

(c) IfA, B e K”” are similar, then trace(A) = trace(B). 


Prove or disprove the following statements: 


(a) There exist matrices A, B € K™” with trace(AB) Æ trace(A) trace(B). 
(b) There exist matrices A, B e K”” with AB — BA = I. 


Suppose that the matrix A = [a;;] € C™” has only real entries a;;. Show 
that if A € C\R is an eigenvalue of A with corresponding eigenvector v = 
[11,.--,%]- € C™!, then also À is an eigenvalue of A with corresponding 
eigenvector U := [D], ... , Dn]”. 


Chapter 9 
Vector Spaces 


In the previous chapters we have focussed on matrices and their properties. We have 
defined algebraic operations with matrices and derived important concepts associ- 
ated with them, including their rank, determinant, characteristic polynomial, and 
eigenvalues. In this chapter we place these concepts in a more abstract framework 
by introducing the idea of a vector space. Matrices form one of the most important 
examples of vector spaces, and properties of certain (namely, finite dimensional) 
vector spaces can be studied in a transparent way using matrices. In the next chapter 
we will study (linear) maps between vector spaces, and there the connection with 
matrices will play a central role as well. 


9.1 Basic Definitions and Properties of Vector Spaces 


We begin with the definition of a vector space over a field K. 


Definition 9.1 Let K be a field. A vector space over K, or shortly K -vector space, 
is a set V with two operations, 


+: VxV> V, (v, w) => v+ w, (addition) 


: KxV > V, (A, v) =e Àv, (scalar multiplication) 


that satisfy the following: 


(1) (V, +) is a commutative group. 
(2) For all v, w € V and A, u € K the following assertions hold: 


(a) A- (u: v) = (Ap) - v. 
(b) l-v=v. 
(c) A-(vutw)=A-vtA-w. 
(d) (Atp)-v=A-vt+yp-v. 
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An element v € Y is called a vector,! an element À € K is called a scalar. 


Again, we usually omit the sign of the scalar multiplication, 1.e., we usually write 
Av instead of A - v. If it is clear from the context (or not important) which field we 
are using, we often omit the explicit reference to K and simply write vector space 
instead of K -vector space. 


Example 9.2 


(1) The set K”’”” with the matrix addition and the scalar multiplication forms a 
K -vector space. For obvious reasons, the elements of K nl and K!” are some- 
times called column and row vectors, respectively. 

(2) The set K[t] forms a K-vector space, if the addition is defined as in Exam- 
ple3.17 (usual addition of polynomials) and the scalar multiplication for 
p = œo + Qat +... + Qnt” € K[t] is defined by 


àA- p:= (Aan) + Aat +... + Aat". 


(3) The continuous and real valued functions defined on a real interval [a, 6] with 
the pointwise addition and scalar multiplication, 1.e., 


(f + g)(Qx) := f(x) +g) and (A: fy) := Af Œ), 


form an R-vector space. This can be shown by using that the addition of two 
continuous functions as well as the multiplication of a continuous function by 
a real number yield again a continuous function. 


Since, by definition, (V, +) is acommutative group, we already know some vector 
space properties from the theory of groups (cp. Chap. 3). In particular, every vector 
space contains a unique neutral element (with respect to addition) Oy, which is called 
the null vector. Every vector v € V has a unique (additive) inverse —v € Y with 
v + (—v) =v — v = Oy. As usual, we will write v — w instead of v + (—w). 


Lemma 9.3 Let V be a K -vector space. If Og and Oy are the neutral (null) elements 
of K and V, respectively, then the following assertions hold: 


(1) Ox -v=Oy forall v € V. 
(2) A-Oy = 0y forall A€ K. 
(3) —(A-v) = (—A) -v = à- (—v) forallv E V and A€ K. 


l This term was introduced in 1845 by Sir William Rowan Hamilton (1805—1865) in the context of 
his guaternions. It is motivated by the Latin verb “vehi” (“‘vehor’, “vectus sum”) which means to 
ride or drive. Also the term “scalar” was introduced by Hamilton; see the footnote on the scalar 


multiplication (4.2). 
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Proof 


(1) Forallv € V we have Ox -v = (Ox +0x)-v = Og -v +0x -v. Adding —(Ox - v) 
on both sides of this identity gives Oy = Ox - v. 

(2) Forall A € K we have \-0y = A-(Oy+0y) = A-0y +A-Oyp. Adding —(A -0y) 
on both sides of this identity gives Oy = A- Oy. 

(3) Forall A € K and v € VY we have A -v + (—A):-v = (A—A)-v=O0xK-v=O0y, 
as well as A- v +A. (=v) =A. (w — v) =X- Oy = Op. o 


In the following we will write 0 instead of Ox and Oy when it is clear which null 
element is meant. 

As in groups, rings and fields we can identify substructures in vector spaces that 
are again vector spaces. 


Definition 9.4 Let (V,+,-) be a K-vector space and let U C V. If (U,+,-) isa 
K -vector space, then it is called a subspace of (V, +, -). 


A substructure must be closed with respect to the given operations, which here 
are addition and scalar multiplication. 


Lemma 9.5 (U,+,-) is a subspace of the K -vector space (V, +, -) if and only if 
Ø +U CV and the following assertions hold: 


(1) v+tw EU forallv,w €Y, 
(2) Av Ee U forall A € K and v €U. 


Proof Exercise. o 
Example 9.6 


(1) Every vector space V has the trivial subspaces U = V and U = {0}. 
(2) Let A €e K”™” andU = Z(A,0) C K™!, i.e., U is the solution set of the 
homogeneous linear system Ax = 0. We have 0 € U, so U is not empty. If 
v, w € U, then 
Alv +w) = Av+ Aw =0+0=0, 


i.e., v + w € U. Furthermore, for all A € K, 
A(Av) = A (Av) =A0 =), 


i.e., Av € U. Hence, U is a subspace of K”!. 
(3) For every n € No the set K[t]<, := {p € K[t] | deg(p) < n} is a subspace of 
K{t]. 


Definition 9.7 Let V be a K-vector space, n € N, and vj,..., Un E V. A vector of 
the form 


Avi +... + Ànn = S Avi e V 
i=] 
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is called a linear combination of vı, ..., u, with the coefficients 1,..., An E K. 
The (linear) span of vı, ..., Uy, is the set 


span{vi, ..., Un} = [Z Av | A1,.--,An € K). 
i=l 


Let M be a set and suppose that for every m € M we have a vector Vm € V. Let 
the set of all these vectors, called the system of these vectors, be denoted by {Vm }mem. 
Then the (linear) span of the system {Vm }mem, denoted by span {Vm }mem, is defined 
as the set of all vectors v € Y that are linear combinations of finitely many vectors 
of the system. 


This definition can be consistently extended to the case n = 0. In this case 
Vi, ..., Un 1S a list of length zero, or an empty list. If we define the empty sum of 
vectors as 0 € V, then we obtain span{v,,..., v,} = span Ø = {0}. 

If in the following we consider a list of vectors v1,..., U or a set of vectors 
{vj,..., Un}, we usually mean that n > 1. The case of empty list and the associated 
zero vector space V = {0} will sometimes be discussed separately. 


Example 9.8 The vector space K!? = {[ay, a2, a3] | a1, a2, a3 € K} is spanned 
by the vectors [1, 0, 0], [0, 1, OJ, [0, 0, 1]. The set {[a;, a2, 0] | a1, a2 E€ K} forms 
a subspace of K !° that is spanned by the vectors [1, 0, 0], [0, 1, 0]. 


Lemma 9.9 If V is a vector space and v1, ..., Un E€ V, then span{v,,..., Un} is a 
subspace of V. 

Proof It is clear that Ø Æ span{v,,...,v,} C V. Furthermore, span{v,,..., Un} is 
by definition closed with respect to addition and scalar multiplication, so that (1) and 
(2) in Lemma9.5 are satisfied. o 


9.2 Bases and Dimension of Vector Spaces 


We will now discuss the central theory of bases and dimension of vector spaces, and 
start with the concept of linear independence. 


Definition 9.10 Let Y be a K-vector space. 


(1) The vectors v1, ..., V, € V are called linearly independent if the equation 


> rit; = 0 with \1,...,A, EK 


i=l 


always implies that A; = --- = A, = 0. Otherwise, i.e., when X;_; Aju; = 0 
holds for some scalars A;,..., An € K that are not all equal to zero, then the 
vectors v1, ..., U, are called linearly dependent. 
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(2) The empty list is linear independent. 

(3) If M isa set and for every m € M we have a vector Vm € V, the corresponding 
system {Um}mem is called linearly independent when finitely many vectors of 
the system are always linearly independent in the sense of (1). Otherwise the 
system is called linearly dependent. 


The vectors v1, ..., UV, are linearly independent if and only if the zero vector can 
be linearly combined only in the trivial way 0 = O0- vı +...+0-v,. Consequently, 
if one of these vectors is the zero vector, then v1, ..., Un are linearly dependent. A 
single vector v is linearly independent if and only if v Æ 0. 

The following result gives a useful characterization of the linear independence of 
finitely many (but at least two) given vectors. 


Lemma 9.11 The vectors vj, ..., Vn, n > 2, are linearly independent if and only if 
no vector vi, i = 1,...,n, can be written as a linear combination of the others. 


Proof We prove the assertion by contraposition. The vectors v1, ..., UV, are linearly 
dependent if and only if 
n 
>. Ai V; = 0 
i=l 


with at least one scalar A; Æ 0. Equivalently, 


vj; = — > Ar Ü 


i£j 
so that v j is a linear combination of the other vectors. m 


Using the concept of linear independence we can now define the concept of the 
basis of a vector space. 


Definition 9.12 Let V be a vector space. 


(1) A set {v,,...,U,} C V is called a basis of V, when v,,..., Un are linearly 
independent and span{v,,..., Vn} = V. 

(2) The set Ø is the basis of the zero vector space V = {0}. 

(3) Let M be a set and suppose that for every m € M we have a vector vm € V. The 
set {Vm |m € M} is called a basis of V if the corresponding system {Vm }mem 1S 
linearly independent and span {Vm}mem = V. 


In short, a basis is a linearly independent spanning set of a vector space. 
Example 9.13 


(1) Let £;; € K™” be the matrix with entry 1 in position (i, j) and all other entries 0 
(cp. Sect. 5.1). Then the set 
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{Eyl 1 <i<n and l<j <m} (9.1) 


is a basis of the vector space K™™ (cp. (1) in Example 9.2): The matrices Ej; € 
K""™, 1 <i <nand1 < j <m, are linearly independent, since 


0 = S > NE = [ij] 
= 


implies that Aj; = O for i = 1,...,n and j = 1,...,m. For any A = [a;;] € 
K”" we have 


i=1 j=l 
and hence 
span{E;;|1<i<n and l1<j<m} = K”. 


The basis (9.1) is called the canonical or standard basis of the vector space 
K"™ For m = 1 we denote the canonical basis vectors of K™! by 


1 0 0 
0 1 

e] ~~ 0 ’ e? — > ’ en = 0 
; ; 0 
0 0 1 


These vectors are also called unit vectors; they are the n columns of the identity 
matrix Í. 

(2) A basis of the vector space K[t] (cp. (2) in Example 9.2) is given by the set 
{t” |m € No}, since the corresponding system {t’"} en, 1S linearly independent, 
and every polynomial p € K [t] is a linear combination of finitely many vectors 
of the system. 


The next result is called the basis extension theorem. 


Theorem 9.14 Let V be a vector space and let vi, ..., V;, W1,..., We E V, where 
r, £ € No. Ifv, ..., v, are linearly independent and span{v1,..., Vy, W1, ..., We} = 
Y, then the set {v,,..., v,} can be extended to a basis of V using vectors from the 
set {W1,..., We}. 
Proof Note that forr = 0 the list v;,..., v, is empty and hence linearly independent 
due to (2) in Definition 9.10. 

We prove the assertion by induction on £. If £ = O, then span{v,,..., v} = V, 


and the linear independence of {v;, ..., v} shows that this set is a basis of V. 
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Let the assertion hold for some £ > 0. Suppose that v1, ..., Ur, W1, ..., Wep E V 
are given, where v1, ..., v, are linearly independent and span{v1, ..., Ur, W1,..., 
weit} = V. If {v,,...,v,} already is a basis of V, then we are done. Suppose, 
therefore, that span{v1, ..., v,;} C V. Then there exists at least one j, 1 < j < €+1, 
such that w; ¢ span{v,,..., v,}. In particular, we have w; Æ 0. Then 


AW j Sg — 0 


i=] 


implies that A = 0 (otherwise we would have w; € span{v,...,v,}) and, 
therefore, A; = --- = A, = O due to the linear independence of vi, ..., V. 
Thus, vj,...,U,, wj are linearly independent. By the induction hypothesis we 
can extend the set {v),...,v,;, wj} to a basis of V using vectors from the set 
{W1,.--, We+i} \ {w;}, which contains £ elements. o 


Example 9.15 Consider the vector space VY = K [t]<3 (cp. (3) in Example 9.6) and 


the vectors v} = t, vy = t°, v3 = t°. These vectors are linearly independent, 
but {v1, v2, v3} is not a basis of V, since span{v;, v2, v3} Æ V. For example, the 
vectors w; = £? + 1 and w = t? — t? — 1 are elements of V, but wi, w2 ¢ 


span{v;, v2, v3}. We have span{v1, v2, v3, w1, w2} = V. If we extend {v1, v2, v3} by 
w 1, then we get the linearly independent vectors v1, v2, v3, w1 which indeed span V. 
Thus, {v1, v2, v3, w1} is a basis of V. 


By the basis extension theorem every vector space that is spanned by finitely many 
vectors has a basis consisting of finitely many elements. A central result of the theory 
of vector spaces is that every such basis has the same number of elements. In order 
to show this result we first prove the following exchange lemma. 


Lemma 9.16 Let V be a vector space, let v,,..., Um E V and let w = 2 j \;Uj € 
V with Ai Æ 0. Then span{w, v2, ..., Um} = span{vi, v2, ..., Um}. 


Proof By assumption we have 


v = de — >. (A; 'A;) Vj. 


i=2 


If y € span{vı, ... , Um}, say y = X; Yi vi, then 
m m 

Pi (ai = > (Ay 'X:) n) ij D nu 
i=2 


— ("7") w+ (yi = MAT Xi) v; € span{w, v2,..., Um}. 
i=2 


If, on the other hand, y = aw + È; , aivi € span{w, v2,..., Um}, then 
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m m 
Q1 > Xj U; + > Qi Uj 


m 
= Àv + > (aA; +a;) v; € span{v,,..., Um}, 
i=2 


< 
| 


and thus span{w, v2,..., Um} = span{vj, v2,..., Um}. oO 


Using this lemma we now prove the exchange theorem.” 


Theorem 9.17 Let W = {w1,..., Wn} and U = {u], ..., Um} be finite subsets of a 
vector space, and let w1, ..., Wn be linearly independent. IfW C span{uy, ..., Um}, 
thenn < m, andn elements of U, if numbered appropriately the elements u,, ..., Un, 
can be exchanged against n elements of W in such a way that 


Span {Wis 2155 Wn, Un+1, «sco Um} = Span{ui, ..., Un, Un+l, ---, Um}. 
Proof By assumption we have w; = X7-; Azu; for some scalars À1,..., Am that 
are not all zero (otherwise w; = 0, which contradicts the linear independence of 
W1, ..., Wn). After an appropriate renumbering we have A, Æ 0, and Lemma9.16 
yields 

span{w ,U2,...,Um} = span{u], U2, ..., Um}. 
Suppose that for somer, 1 < r < n—1, we have exchanged the vectors u1, ..., ur 

against w1, ..., W, SO that 

Span Wicc: Wrs ann a um] ~ SP lis co05 Urs Urip coy Umr 


It is then clear that r < m. 


By assumption we have w,41 € span{u], ..., Um}, and thus 
r m 
Wri = > Ajw + > Aili 
i=l i=r+1 
for some scalars A;,..., Am. One of the scalars 4,41, ..., Am must be nonzero (oth- 
erwise W;4, E Span{w ,,..., w,}, which contradicts the linear independence of 
W1,..., Wm). After an appropriate renumbering we have A,1; Æ 0, and Lemma9.16 
yields 
Span{ Wie ccs Writs Ups occi Um) = SPAN Wiss sss Ws psc vag yh. 


If we continue this construction until r = n — 1, then we obtain 


*Tn the literature, his theorem is sometimes called the Steinitz exchange theorem after Ernst Steinitz 
(1871-1928). The result was first proved in 1862 by Hermann Günther Grafimann (1809-1877). 
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Span W504 25 Was Upto ess Um} = SPAN Wis -++ Uns Unyi ena limh 
where in particular n < m. o 


Using this fundamental theorem, the following result about the unique number of 
basis elements is a simple corollary. 


Corollary 9.18 Ifa vector space V is spanned by finitely many vectors, then V has 
a basis consisting of finitely many elements, and any two bases of V have the same 
number of elements. 


Proof The assertion is clear for V = {0} (cp. (2) in Definition 9.12). Let V = 


span{v,,..., Um} with vı Æ 0. By Theorem9.14, we can extend span{v;} using 
elements of {v2, ..., Um} to a basis of V. Thus, V has a basis with finitely many 
elements. Let U := {u,,..., ue} and W := {uw 1, ..., wg} be two such bases. Then 


Theorem 9.18 
=> 


W CV=span{u,..., ue} kS, 


Theorem 9.18 
=> LEk, 


U C V = span{wi,..., Wk} 


and thus £ = k. o 
We can now define the dimension of a vector space. 


Definition 9.19 If there exists a basis of a K -vector space VY that consists of finitely 
many elements, then V is called finite dimensional, and the unique number of basis 
elements is called the dimension of V. We denote the dimension by dimx (V) or 
dim(V), if it is clear which field is meant. 

If V is not spanned by finitely many vectors, then VY is called infinite dimensional, 
and we write dimx (V) = oo. 


Note that the zero vector space V = {0} has the basis Ø and thus it has dimension 
zero (cp. (2) in Definition 9.12). 

If V is a finite dimensional vector space and if vj, ..., Un E€ V with m > dim(V), 
then the vectors v1, ..., Un must be linearly dependent. (If these vectors were linearly 
independent, then we could extend them via Theorem 9.14 to a basis of V that would 
contain more than dim(V) elements.) 


Example 9.20 The set in (9.1) forms a basis of the vector space K”’”". This basis has 
n-m elements, and hence dim(K™™) = n- m. On the other hand, the vector space 
K [t] is not spanned by finitely many vectors (cp. (2) in Example 9.13) and hence it 
is infinite dimensional. 


Example 9.21 Let V be the vector space of continuous and real valued functions on 
the real interval [0, 1] (cp. (3) in Example 9.2). Define for n = 1, 2,... the function 
fn E€ V by 
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0, pad- 
0, Ep? 
n 


h=] nnt dx—2n, L 


—2n(n + l)x + 2n + 2, (+) E = L, 





k 
Every linear combination > A; f; is a continuous function that has the value A; 
j=l 


k 
at 5 (J + aa), Thus, the equation > A; f; = 0 € V implies that all A; must be 
j=l 


JA 
zero, so that fi,..., fk € V are linearly independent for all k € N. Consequently, 
dim(V) = œ. 


9.3 Coordinates and Changes of the Basis 


We will now study the linear combinations of basis vectors of a finite dimensional 
vector space. In particular, we will study what happens with a linear combination if 
we change to another basis of the vector space. 


Lemma 9.22 If {v,,...,U,} is a basis of a K -vector space V, then for every v € V 
there exist uniquely determined scalars X,,..., A, E€ K with v = Aivi +... + Àn Vn. 
These scalars are called the coordinates of v with respect to the basis {v,,..., Un}. 


Proof Let v = X$ Av; = >o;_, pivi for some scalars A;, y; E€ K, i = 1,...,n, 
then 


O=v—v= DA = Miu. 
i=l 
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The linear independence of v1, ..., Un implies that A; = u; for i = 1,...,n. oO 


By definition, the coordinates of a vector depend on the given basis. In particular, 
they depend on the ordering (or numbering) of the basis vectors. Because of this, 
some authors distinguish between the basis as “set”, 1.e., a collection of elements 
without a particular ordering, and an “ordered basis”. In this book we will keep the 


set notation for a basis {v,;,..., Vn}, where the indices indicate the ordering of the 
basis vectors. 

Let V be a K -vector space, v1, ..., Un E V (they need not be linearly independent) 
and 


v= AU t+... tAypvy 


for some coefficients A,,..., A, € K. Let us write 
Àl 
(Vj,.6.5Un) | = | = Avi +... F AU (9.2) 
An 
Here (v1, ..., Vn) is an n-tuple over V, i.e., 
(vi, ..., Vy) E Vo =Vx...xXV. 
mm 


n times 


For n = 1 we have V! = V. We then skip the parentheses and write v instead of 
(v) for a 1-tuple. The notation (9.2) formally defines a “multiplication” as map from 
yY” x K™! toV. 

For all a € K we have 


QAI 
a:-vu=(a-A;)uy +... + (a: An) Un = (V1, ..., Un) : 
AA) 
If ui, ..., Un E K and 
Hı 
U = HV +... + HUnYn = (V1,---,Un) | if, 
Hn 
then 
Ay + py 


Dae SAG US ee F Ar F i e= Visa Va) 
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This shows that if vectors are given by linear combinations, then the operations 
scalar multiplication and addition correspond to operations with the coefficients of 
the vectors with respect to the linear combinations. 

We can further extend this notation. Let A = [a;;] € K™” and let 


dij 
U; = (Uis: Un) J> J= l,..., m. 
Then we write the m linear combinations for u1, ..., Um as the system 
(Wises Um) = (Vires MlA (9.3) 
On both sides of this equation we have elements of V”. The right-multiplication of 
an arbitrary n-tuple (v1, ..., Vn) € V” with a matrix A e K”” thus corresponds 
to forming m linear combinations of the vectors v1, ..., U„, with the corresponding 


coefficients given by the entries of A. Formally, this defines a “multiplication” as a 
map from V” x K™™ to Y”. 


Lemma 9.23 Let VY bea K -vector space, letvi, ..., Un € V be linearly independent, 
let A € K”, and let (uj, ..., Um) = (v1, ..., Vn)A. Then the vectors ui, ..., Um 
are linearly independent if and only if rank(A) = m. 


Proof Exercise. o 


Now consider also a matrix B = [b;;] € K m£ Using (9.3) we obtain 
(Ui, ..., Um)B = ((v1,..., Un)A)B. 
Lemma 9.24 In the previous notation, 
((vi, --., Vn) A)B = (v1, ..., Vn)(A B). 


Proof Exercise. Oo 


Let {vj,..., Vn} and {w1,..., Wn} be bases of V and let v € V. By Lemma9.22 
there exist (unique) coordinates A,,..., An and u1, ..., Un, respectively, with 


Ay H1 
v = (Vi, ..., Un) | |: | = (W1,..., Wn) 
An Ln 
We will now describe a method for transforming the coordinates A1, ..., A, with 


respect to the basis {vj,..., Un} into the coordinates 41, ..., Hn with respect to the 
basis {w,,..., Wn}, and vice versa. 
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For every basis vector vj, j = 1,...,n, there exist (unique) coordinates p;ij, 
i = 1,..., n, such that 
Pij 
U S Wss Wa e e JS Tenancy ls 
Pnj 
Defining P = [p;j] € K”” we can write these n equations for the vectors vj 


analogous to (9.3) as 


(Vis sss 1 Ve) = (Wis esas WP. (9.4) 
In the same way, for every basis vector wj, j = 1,...,n, there exist (unique) 
coordinates q;j, i = 1, ..., n, such that 
qj 
Wj = (V1,..., On) | : |; J=1,... 7. 
dnj 


If we set Q = [g;;] € K””, then analogously to (9.4) we get 


(Wis tad We) = (Vips s s WIO: 


Thus, 


(W1, ..., Wn) = (V1, ..., Vn) Q = ((Ww1, ..., Wn)P)Q = (w1, ..., Wn) (PQ), 


which implies that 


(Wiss: Wi Ne =P O= 10,2340). 


This means that the n linear combinations of the basis vectors w1, ..., Wn, with 
their corresponding coordinates given by the entries of the n columns of I, — PQ, 
are all equal to the zero vector. Since the basis vectors are linearly independent, all 
coordinates must be zero, and hence J, — P Q = 0 e K™”,or PQ = I,. Analogously 
we obtain the equation QP = [,,. Therefore the matrix P € K™” is invertible with 
P! = Q. Furthermore, we have 


M1 Àl Àl 
v = (Vi, ..., Un) | | | = (wi, ..., Wa) P) | | | = (w1... Wad | PY: 
An Xn Xn 
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Due to the uniqueness of the coordinates of v with respect to the basis {w1,..., wy} 
we obtain 
pı Àl Àl Hı 
JSP]: js o |i EP] 


Hence a multiplication with the matrix P transforms the coordinates of v with respect 
to the basis {v1, ..., Vn} into those with respect to the basis {w1, ..., w,}; a multipli- 
cation with P~! yields the inverse transformation. Therefore, P and P~! are called 
coordinate transformation matrices. 

We can summarize the results obtained above as follows. 


Theorem 9.25 Let {v1,..., v,} and {w];, ..., Wn} be bases of a K -vector space V. 
Then the uniquely determined matrix P € K”” is (9.4) is invertible and yields the 
coordinate transformation from {v,,..., Un} to {w1,..., Wn}: If 
Àl Hı 
v = (Vi, ..., Un) | © | 5 Mi. Un) | + |, 
An Hn 
then 
Hı Ài 
EE 
Ln An 


Example 9.26 Consider the vector space V = R? = {(a1, @a2)|@1, a2 € R} with 
the entrywise addition and scalar multiplication. A basis of V is given by the set 
{ey = (1,0), eo = (0, 1)}, and we have (a), a2) = aye; +262 for all (œi, a2) € V. 
Another basis of V is the set {vı = (1, 1), v2 = (1, 2)}. The corresponding coordinate 
transformation matrices can be obtained from the defining equations (v1, v2) = 
(e1, €2)P and (e1, e2) = (v1, v2) Q as 


fil eat) fe 
pati} e= 3] 


9.4 Relations Between Vector Spaces and Their Dimensions 


Our first result describes the relation between a vector space and a subspace. 


Lemma 9.27 If V is a finite dimensional vector space and U C V is a subspace, 
then dim(U) < dim(V) with equality if and only if U = V. 
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Proof Let U C V and let {u),..., Um} be a basis of U, where {u1, ..., Um} = Ø 
for U = {0}. Using Theorem 9.14 we can extend this set to a basis of V. If U is 
a proper subset of VY, then at least one basis vector needs to be added and hence 
dim(Y/) < dim(V). If U¢ = V, then every basis of V is also a basis of U, and thus 
dim(U) = dim(V). o 


If u and Uh are subspaces of a vector space V, then their intersection is given by 
Ui Olh =fueV|ueceU ^A u €h} 
(cp. Definition 2.6). The sum of the two subspaces is defined as 
Ui +h := fu, +u2 E€ V | u; EUL A u E€ lb}. 


Lemma 9.28 Jf U and Uh are subspaces of a vector space V, then the following 
assertions hold: 


(1) Ui Oh and U, + h are subspaces of V. 
(2) Ui +U = Ui. 
(4) Ui CU, + Uh, with equality if and only ifu C U4. 


Proof Exercise. Oo 
An important result is the following dimension formula for subspaces. 


Theorem 9.29 Jf U4, and Uh are finite dimensional subspaces of a vector space V, 


then 

dim (U N Uh) + dim (U + U2) = dim (U1) + dim 2). 
Proof Let {vi,...,v,} be a basis of Ui O Uh. We extend this set to a basis 
{vi,..., U, W1, ..., We} of U and to a basis {v1,..., Ur, X1,..., Xg} Of Uh, where 


we assume that r, £, k > 1. (If one of the lists is empty, then the following argument 
is easily modified.) 


If suffices to show that {v1, ..., Vp, W1, ..., WE, X1, ..., Xg} 18 a basis of Uy +2. 
Obviously, 
span{vi,..., Up, W1,..., WE, X1,..., Xk} =U, +M, 
and hence it suffices to show that vi, ..., Vr, W1, ..., We, X1,..., Xk are linearly 


independent. Let 


r £ k 
> iui + $ miw + Do yx =0, 
i=l i=l i=l 
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then 


k r £ 
S fiai = (È Aiv; + Suw) 
ci = = 


On the left hand side of this equation we have, by definition, a vector in W2; on the 
right hand side a vector in U1. Therefore, 5 yixi E€ U; N U2. By construction, 
however, {v;,..., V, } is a basis of Ui Oh and the vectors vi, ..., U, W1, ..., We are 
linearly independent. Therefore, S uiwi = 0 implies that u, = --- = pe = Q. 
But then also 


r k 
Š Aivi 7 xi = 0, 
i=l i=l 


and hence Aj = --- = A = %1 = +-+- = yg = O due to the linear independence of 
iawn eee Vry oe gh wea g es L 


If at least one of the subspaces in Theorem9.29 is infinite dimensional, then 
the assertion is still formally correct, since in this case dim(U, + U2) = œ and 
dim(U/,) + dim(U) = oo. 


Example 9.30 For the subspaces 
Ui = {[a1, a2, 0] | a1, a2 € K}, Uy = {[0, a2, 03] | 02,03 € K} CK 
we have dim (U1) = dim (h2) = 2, 


Ui NU = {[0, a2, 0] | a2 E K}, dim, U2) = 1, 
Uth =K, dim (U; + U2) = 3. 


The above definition of the sum can be extended to an arbitrary (but finite) number 
of subspaces: If U1, ..., Uk, k > 2, are subspaces of the vector space V, then we 
define 


k k 
Ui +... +U = > U; = {Dia lay ety, FHL... eb 
j=) 


j=l 
This sum is called direct, if 


k 
UN > U; = {0} fori=1,...,k, 


i=! 
j#i 


and in this case we write the (direct) sum as 
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k 
Ui ®... 8U = QU;. 
j=l 


In particular, a sum U1 +u of two subspaces U, Uh C V is direct if U N Uh = {0}. 
The following theorem presents two equivalent characterizations of the direct sum 
of subspaces. 


Theorem 9.31 IU = Ui +... +u; is asum of k > 2 subspaces of a vector space 
Y, then the following assertions are equivalent: 


(1) The sumU is direct, i.e., U; N È U = {0 fort = leans k 

(2) Every vectoru € U has a representation of the formu = ae u ; with uniquely 
determined u; € U; for j =1,...,k. 

(3) Dia 4 = Owithu; € U; for j = 1,...,k implies that u; = 0 for j = 
l,...,k. 


Proof 
(1) = (2): Letu = Diau D ay wia, E Uj, j = 1,...,k. For 
every i = 1,..., k we then have 
uj; — Uj = — > (uj — j) € UNS U 
ii j#l 


Now U; N Didi U; = {0} implies that u; — u; = 0, and hence u; = u; for 
f= onal. 

(2) = (3): This is obvious. 

(3) => (1): For a given i, let u € Ui N Didi U;. Then u = Didi u; for some 
uj € Uj, j Fi, and hence —u + > ;,; uj = O. In particular, this implies that 
u = 0, and thus U; N Ši; Uj = {0}. o 


Exercises 


(In the following exercises K is an arbitrary field.) 


9.1. Which of the following sets (with the usual addition and scalar multiplication) 
are IR-vector spaces? 


flon, a2] € R? | aj = an}, ffon, ag] € R! | at +05 = i}, 


flon, aj] € RE? |a > an}, flon, aj] € RE? |a — an2 = 0 and 2a1 +a) = o} ; 


Determine, if possible, a basis and the dimension. 

9.2. Determine a basis of the R-vector space C and dimg (C). Determine a basis of 
the C-vector space C and dimç (C). 

9.3. Show thata, ..., a, € K”™! are linearly independent if and only if det (a1, ..., 


anl) Æ 0. 


132 


9.4. 


oD: 
9.6. 


weve 


9.8. 


PAo 


9.10. 


9.11. 
OZ: 


913: 


9.14. 


OAD: 
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Let V be a K-vector space, Q a nonempty set and Map(Q2, V) the set of maps 
from Q to V. Show that Map(Q, V) with the operations 


+ : Map(Q, V) x Map(Q, V) > Map(2,V), (fig) => ft+e, 
with (f + g)(x) := f(x)+ g(x) forall x € Q, 

-: K x Map(Q, V) > Map(Q,V), A, P) PR A- f, 
with(A - f)(x) := Af (x) forall x € Q, 


is a K-vector space. 

Show that the functions sin and cos in Map(R, R) are linearly independent. 
Let V be a vector space with n = dim(V) e N and let v1,..., Un € V. Show 
that the following statements are equivalent: 


(1) vj,..., Vn are linearly independent. 
(2) “Span Ulase M = V: 
(3) {v,,..., Vn} is a basis of V. 


Show that (K”’”", +, -) is a K-vector space (cp. (1) in Example9.2). Find a 
subspace of this K-vector space. 

Show that (K[t], +, -) is a K-vector space (cp. (2) in Example9.2). Show 
further that K [t]<,, is a subspace of K [t] (cp. (3) in Example 9.6) and determine 
dim(K [t]<,). 

Show that the polynomials pj = £ + tt, p = © — 70, p = P — 1, 
pa = t? + 3t are linearly independent in Q[t]<s5 and extend {p1, Po, P3, pa} 
to a basis of Q[t]<s. 

Letn € N and 


n 
K[t,, h] := | 2 ait | On = K). 
i,j=0 


An element of K [t;, t2] is called bivariate polynomial over K in the unknowns 
ti and t. Define a scalar multiplication and an addition so that K[t, t2] 
becomes a vector space. Determine a basis of K[tf,, t2]. 

Show Lemma 9.5. 

Let A € K™” andb € K™!. Is the solution set -X (A, b) of Ax = b a subspace 
of K™!? 

Let A € K”” and let A € K be an eigenvalue of A. Show that the set {v € 
K"!| Av = Av} is a subspace of K”!. 

Let A € K™” and let A; Æ Aà be two eigenvalues of A. Show that any two 
associated eigenvectors vı and v2 are linearly independent. 

Show that B = {B}, Bo, B3, By} and C = {C], Ca, C3, C4} with 


11 10 10 11 
B= 159): B= fool B= |i 5). B= loi] 
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9.16. 


9.17. 


9.18. 
9.19. 
9.20: 
O71; 
RA 


O23: 


9.24. 


and 


10 10 10 01 
Gilat til ap alia) c= {05 


are bases of the vector space K*”, and determine corresponding coordinate 
transformation matrices. 

Examine the elements of the following sets for linear independence in the 
vector space K [t]<3: 


Uc FOL PORE LT. GH (1.7747.7 +7} 
aN Oe ey om EN, 


Determine the dimensions of the subspaces spanned by the elements of Uj, 
U2, U3. Is one of these sets a basis of K [t]<3? 

Show that the set of sequences {(a, Q2, @3,...) | a; E€ Q, i € N} with entry- 
wise addition and scalar multiplication forms an infinite dimensional vector 
space, and determine a basis system. 

Prove Lemma 9.23. 

Prove Lemma 9.24. 

Prove Lemma 9.28. 

Let U1, Uh be finite dimensional subspaces of a vector space V. Show that the 
sum U4, + h is direct if dim (U + U2) = dim (u1) + dim (2). 

Let Ui, ..., Ug, k => 3, be finite dimensional subspaces of a vector space V. 
Suppose that U; NU; = {0} for alli Æ j. Is the sum U; +... + Uj direct? 
Let U be a subspace of a finite dimensional vector space V. Show that there 
exists another subspace U with UU U = V. (The subspace M is called a 
complement of U.) 

Determine three subspaces U1, U2, U3 of V = Rè! with U2 Æ U and V = 
Ui BU, =U, D U3. Is there a subspace U of VY with a uniquely determined 
complement? 


Chapter 10 
Linear Maps 


In this chapter we study maps between vector spaces that are compatible with the two 
vector space operations, addition and scalar multiplication. These maps are called 
linear maps or homomorphisms. We first investigate their most important properties 
and then show that in the case of finite dimensional vector spaces every linear map 
can be represented by a matrix, when bases in the respective spaces have been chosen. 
If the bases are chosen in a clever way, then we can read off important properties of 
a linear map from its matrix representation. This central idea will arise frequently in 
later chapters. 


10.1 Basic Definitions and Properties of Linear Maps 


We start our investigations with the definition of linear maps between vector spaces. 


Definition 10.1 Let Y and W be K-vector spaces. A map f : Y —> W is called 
linear, when 


d) fv) = Af (v), and 
2) fvurw=fu+ fw), 


hold for all v, w € V and À € K. The set of all these maps is denoted by L(V, W). 


A linear map f : Y —> W is also called a linear transformation or (vector space) 
homomorphism. A bijective linear map is called an isomorphism. If there exists an 
isomorphism between V and W, then the spaces VY and W are called isomorphic, 
which we denote by 


Ve VV. 


A map f € L(Y, V) is called an endomorphism, and a bijective endomorphism is 
called an automorphism. 
© Springer International Publishing Switzerland 2015 135 


J. Liesen and V. Mehrmann, Linear Algebra, Springer Undergraduate 
Mathematics Series, DOI 10.1007/978-3-3 19-24346-7_ 10 


136 10 Linear Maps 


It is an easy exercise to show that the conditions (1) and (2) in Definition 10.1 
hold if and only if 
f(v + pw) = Af(v) + uf (w) 
holds for all A, u € K and v, w € V. 
Example 10.2 


(1) Every matrix A e K™” defines a map 
A: K™!_+ K" xe Ax. 
This map is linear, since 


A(Ax) = AAx forallx €e K™! andi € K, 
A(x +y) = Ax + Ay forallx, y e K™! 


(cp. Lemmas 4.3 and 4.4). 

(2) The map trace: K”” —> K, A = [ajj] + trace(A) := > Aii, 1S linear (cp. 
Exercise 8.8). 

(3) The map 


f : Qlt]<3 > Qe, aat Lot tatt og > ont Fatt 4ao, 
is linear. (Show this as an exercise). The map 
g : Qtl<e3 > Qltler, azt? + azt? +ait+ap œ> ax? + ait +a, 
is not linear. For example, if pi = t + 2 and p = t + 1, then g(p; + p2) = 
2t +9 Æ 2t +5 = g(pi) + g(p2). 
The set of linear maps between vector spaces forms a vector space itself. 

Lemma 10.3 Let V and W be K -vector spaces. For f,g € L(V, W) and € K 
define f + g and À - f by 


(f + g)(v) := fv) + g(v), 
(A+ f)(v) := Af(v), 
forall v € V. Then (L(V, W), +, -) is a K -vector space. 
Proof Cp. Exercise 9.4. o 
The next result deals with the existence and uniqueness of linear maps. 


Theorem 10.4 Let V and W be K -vector spaces, let {v,,..., Um} be a basis of V, 
and let w,,..., Wm E W. Then there exists a unique linear map f € L(V, W) with 
f(v) = w; fori =1,...,m. 
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Proof For every v €e VY there exist (unique) coordinates Ne ec os AY) with v = 
yo, AM v; (cp. Lemma 9.22). We define the map f : V > W by 


m 


fœ) =X d;?w; forall v eV. 


i=l 


By definition, f(v;) = w; fori = 1,..., m. 
We next show that f is linear. For every A € K we have Av = X-A A Yur, 
and hence 


TO= DAA”) mA A =A: 


ive > AVi € VY, then v + u = Dare a + MM ii, and hence 


futu = OP +A? w =X AP w + DAV? wi = fo) + FW). 
i=l i=l i=l 
Thus, f € L(V, W). 
Suppose that g € L(V, W) also satisfies g(v;) = w; fori = 1,...,m. Then for 


every v = $", A®™ v; we have 


Joss ua wa ae a a DAMP w) = 90, 
i=l i=l i=l i=l i=l 


and hence f = g, so that f is indeed uniquely determined. Oo 


Theorem 10.4 shows that the map f € L(V, W) is uniquely determined by the 
images of f at the given basis vectors of V. Note that the image vectors w1, ..., Wm E 
W may be linearly dependent, and that W may be infinite dimensional. 

In Definition 2.12 we have introduced the image and pre-image of a map. We next 
recall these definitions for completeness and introduce the kernel of a linear map. 


Definition 10.5 If V and W are K -vector spaces and f € L(V, W), then the kernel 
and the image of f are defined by 


ker( f) := {v eV | f(v) =O}, im(f):=t{f(v) |v € V}. 
For w € W the pre-image of w in the space V is defined by 
fw) = f dw) = (ve VI fv) = w} 


The kernel of a linear map is sometimes called the null space (or nullspace) of 
the map, and some authors use the notation null (f) instead of ker( f). 
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Note that the pre-image f~!(w) is a set, and that f~! here does not mean the 
inverse map of f (cp. Definition 2.12). In particular, we have f~'(0) = ker( f), and 
if w ¢ im(f), then f-'(w) = Ø, 


Example 10.6 For A € K" and the corresponding map A € £L(K”™!, K™!) from 
(1) in Example 10.2 we have 


ker(A) = {x € K™!|Ax =0} and im(A) = {Ax |x € K™!}. 


Note that ker(A) = Z (A, 0) (cp. Definition 6.1). Let a; € K”! denote the jth 
column of A, j = 1,...,m. For x =[x,...,xXm]/ € K™! we then can write 


m 
Ax = > Xjđāj. 
j=l 


Clearly, 0 € ker(A). Moreover, we see from the representation of Ax that ker(A) = 
{0} if and only 1f the columns of A are linearly independent. The set im(A) is given 
by the linear combinations of the columns of A, i.e., im(A) = span{a),..., Gp}. 


Lemma 10.7 If V and W are K-vector spaces, then for every f € L(V, W) the 
following assertions hold: 


(1) f0) =Oand f(—v) = — f (v) forall v € V. 

(2) If f is an isomorphism, then f~! € L(W, V). 

(3) ker(f) is a subspace of V and im(f ) is a subspace of W. 

(4) f is surjective if and only ifim( f) = W. 

(5) f is injective if and only if ker(f) = {O}. 

(6) If f is injective and if vi, ..., Um € V are linearly independent, then f(v1),..., 
f (Um) E W are linearly independent. 

(7) fvi,..., Vm E V are linearly dependent, then f (v1), ..., f (Um) E W are lin- 
early dependent, or, equivalently, if f (v1), ..., f (Um) € W are linearly inde- 
pendent, then vı, ..., Um E V are linearly independent. 

(8) Ifw €im(f) and ifu € f~'(w) is arbitrary, then 


f! (w) = u + ker( f) := {u + v | v € ker( f )}. 


Proof 


(1) We have f(y) = f (Ox -0y) = Og - f(y) = Oy as well as f(v) + f (~v) = 
f(v + (—v)) = f0) = 0 forall v € V. 

(2) The existence of the inverse map f7! : W — Vis guaranteed by Theorem 2.20, 
so we just have to show that f —! is linear. If w1, w2 € W, then there exist 
uniquely determined v,, v2 € V with w; = f (vı) and w = f (v2). Hence, 


fwi + w) = Ff) + fo) = fo + v2) = v + 02 
= f! (w) + f (wy). 


10.1 Basic Definitions and Properties of Linear Maps 139 
Moreover, for every A € K we have 


f'Qw) = f TAF OD) = fF Ov)) = Av = Af" (wr). 


(3) and (4) are obvious from the corresponding definitions. 

(5) Let f be injective and v € ker(f), 1.e., f(v) = 0. From (1) we know that 
f(O) = 0. Since f(v) = f (0), the injectivity of f yields v = 0. Suppose now 
that ker( f) = {0} and let u, v € V with f (u) = f (v). Then f(u — v) = 0, i.e., 
u — v € ker( f), which implies u — v = 0, i.e., u = v. 

(6) Let X- A; f (vi) = 0. The linearity of f yields 


(San) =. ie. a € ker( f). 
i=l 


i=l 


Since f is injective, we have >*"_, Aju; = 0 by (5), and hence à; = --- = 


Am = 0 due to the linear independence of vj, ..., Vm. Thus, f(v1),..., f(Um) 
are linearly independent. 
(7) Ifv,,..., Um are linearly dependent, then ae Ajv; = Oforsome A1,..., Amn € 


K that are not all equal to zero. Applying f on both sides and using the linearity 
yields ar A; f (vi) = 0, hence f(v1),..., f(Um) are linearly dependent. 
Let w € im(f) andu € f7! (w). 

Ifv e f7! (w), then f(v) = f(u), and thus f(v — u) = 0, i.e., v — u € ker( f) 
or v € u + ker( f). This shows that f7! (w) C u + ker( f). 

If, on the other hand, v € u +ker( f), then f(v) = f(u) = w,ie.,v € fT! (w). 
This shows that u + ker(f) C f~! (w). o 


(8 


Ne 


Example 10.8 Consider a matrix A e K”” and the corresponding map A € 
L(K™!, K™!) from (1) in Example 10.2. For a given b € K™! we have A~! (b) = 
L(A, b).Ifb ¢ im(A), then Z (A, b) = Ø (case (1) in Corollary 6.6). Now suppose 
that b € im(A) and let x € (A, b) be arbitrary. Then (8) in Lemma 10.7 yields 


L(A, b) =x + ker(A), 
which is the assertion of Lemma 6.2. If ker(A) = {0}, i.e., the columns of A are 
linearly independent, then |- (A, b)| = 1 (case (2) in Corollary 6.6). Ifker(A) Æ {0}, 


i.e., the columns of A are linearly dependent, then |Z (A, b)| > 1 (case (3) in 
Corollary 6.6). If {w,,..., wg} is a basis of ker(A), then 


£ 
L(A, b) = {P+ D Awi [Aro Xe cK}. 
i=] 


Thus, the solutions of Ax = b depend of £ < m parameters. 
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The following result, which gives an important dimension formula for linear maps, 
is also known as the rank-nullity theorem: The dimension of the image of f is equal to 
the rank of a matrix associated with f (cp. Theorem 10.22 below), and the dimension 
of the kernel (or null space) of f is sometimes called the nullity! of f. 


Theorem 10.9 Let V and W be K -vector spaces and let V be finite dimensional. 
Then for every f € L(V, W) we have the dimension formula 


dim(V) = dim(im(f)) + dim(ker(/)). 


Proof Let v1,...,U, E V. If f(vi),..., fUn) E W are linearly independent, 
then by (7) in Lemma 10.7 also vj,..., vu, are linearly independent, and thus 
dimam(f)) < dim(V). Since ker(f) C V, we have dim(ker(f)) < dim(V), so 
that im( f) and ker(f) are both finite dimensional. 


Let {w,,..., w,}and{v;,..., vg} be bases ofim( f ) and ker(f), respectively, and 
letu; € f-'(w),...,u, E€ f-'(w,). We will show that {u1, ..., Up, U1, ..., UE} is 
a basis of VY, which then implies the assertion. 

Ifv e V, then by Lemma 9.22 there exist (unique) coordinates 41,..., ur € 


K with f(v) = J yurt Let V i= J g wju;, then f(v) = f(v), and hence 
v — v € ker(f), which gives v —v = 5“ A;v; for some (unique) coordinates 
A1,-.-,Ax~ E€ K. Therefore, 


k r k 
v=v + S Aù; = Š piui T Š Aivi, 
i=l i=l i=] 


and thus v € span{u], ..., Ur, V;,..., Ug}. Since {u,,..., Up, Vj,..., Ue} C V, we 
have 

V = Span Uys ..., Up, U1, ..., Uk}, 
and it remains to show that u1, ..., Ur, v1, ..., Vg are linearly independent. If 


r k 
Š aiui + S Biri = 0, 
= 4 


then 
r k r f 
C= JU = (Zon “le > 4) = > aj fui) = È aw; 
j=l i=1 i=l i=l 
and thus a; = --- =a, = 0, because w1, ..., w, are linearly independent. Finally, 
the linear independence of vı, ..., vg implies that 61 =--- = Bk = 0. o 


l This term was introduced in 1884 by James Joseph Sylvester (1814-1897). 
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Example 10.10 


(1) For the linear map 


a a 
FQ o Qe! fin = F 0 o, _ i + o 
l , 101 E A 
a 


Q3 3 ee 
we have 
Q a 
im(f) = i ae Ql, ker( f) = a | | &1, @2 E Q 





Hence dim(im(f)) = 1 and dim(ker(f)) = 2, so that indeed dim(im(f)) + 
dim(ker(f)) = dim(Q?!). 
(2) If A € K™™” and A € L(K™!, K™!) are as in (1) in Example 10.2, then 


m = dim(K™!) = dim(ker(A)) + dim(im(A)). 


Thus, dim(im(A)) = m if and only if dim(ker(A)) = 0. This holds if and only if 
ker(A) = {0}, 1.e., if and only if the columns of A are linearly independent (cp. 
Example 10.6). If, on the other hand, dim(im(A)) < m, then dim(ker(A)) = 
m — dim(im(A)) > 0, and thus ker(A) Æ {0}. In this case the columns of A 
are linearly dependent, since there exists an x € K™! \ {0} with Ax = 0. 


Corollary 10.11 Jf V and W are K -vector spaces with dim(V) = dim(W) € N 
and if f € L(V, W), then the following statements are equivalent: 


(1) f is injective. 
(2) f is surjective. 
(3) f is bijective. 


Proof If (3) holds, then (1) and (2) hold by definition. We now show that (3) is 
implied by (1) as well as by (2). 

If f is injective, then ker( f) = {0} (cp. (5) in Lemma 10.7) and the dimension 
formula of Theorem 10.9 yields dim (W) = dim(V) = dim (im( f )). Thus, im( f) = 
W (cp. Lemma 9.27), so that f is also surjective. 

If f is surjective, 1.e., im(f) = W, then the dimension formula and dim (W) = 
dim(V) yield 


dim(ker(f)) = dim(V) — dim(im(f)) = dim(W) — dim(im(f)) = 0. 
Thus, ker( f) = {0}, so that f is also injective. o 


Using Theorem 10.9 we can also characterize when two finite dimensional vector 
spaces are isomorphic. 
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Corollary 10.12 Two finite dimensional K -vector spaces V and W are isomorphic 
if and only if dim(V) = dim (W). 


Proof If V = W, then there exists a bijective map f € L(V, W). By (4) and (5) in 
Lemma 10.7 we have im( f) = W and ker( f) = {0}, and the dimension formula of 
Theorem 10.9 yields 


dim(V) = dim(im(f)) + dim(ker(f)) = dim(W) + dim({0}) = dim(V). 


Let now dim(V) = dim(/V). We need to show that there exists a bijective f € 
L(V, W). Let {v),..., Vn} and {w1,..., w,} be bases of V and W. By Theorem 10.4 
there exists a unique f € L(V, W) with f(v;) = wj,i = 1,...,n. If v = Ayvy + 
... H AÀnVn E ker( f), then 


0 = fw) = f(v t... + Ann) = Afw) +... + Anf (un) 
= Àw +... F AÀAnWn. 


Since w1, ..., W, are linearly independent, we have A; = --- = A, = 0, hence v = 0 
and ker( f) = {0}. Thus, f is injective. Moreover, the dimension formula yields 
dim(V) = dim(im(f)) = dim(W) and, therefore, im( f) = W (cp. Lemma 9.27), 
so that f is also surjective. oO 


Example 10.13 


(1) The vector spaces K™™ and K™” both have the dimension n-m and are therefore 
isomorphic. An isomorphism is given by the linear map A > A’. 

(2) The R-vector spaces Rt? and C = {x + iy | x, y € R} both have the dimen- 
sion 2 and are therefore isomorphic. An isomorphism is given by the linear map 
[x,y] > x + iy. 

(3) The vector spaces Q[t]<2 and Q! both have dimension 3 and are therefore 
isomorphic. An isomorphism is given by the linear map azt? + aıt + ag => 
15 Cig 0 |: 


Although Mathematics is a formal and exact science, where smallest details mat- 
ter, one sometimes uses an “abuse of notation” in order to simplify the presentation. 
We have used this for example in the inductive existence proof of the echelon form 
in Theorem 5.2. There we kept, for simplicity, the indices of the larger matrix A“ in 


the smaller matrix A® = ie] The matrix A® had, of course, an entry in position 


(1, 1), but this entry was denoted by a rather than ay. Keeping the indices in the 
induction made the argument much less technical, while the proof itself remained 
formally correct. 

An abuse of notation should always be justified and should not be confused with 
a “misuse” of notation. In the field of Linear Algebra a justification is often given 
by an isomorphism that identifies vector spaces with each other. For example, the 
constant polynomials over a field K, i.e., polynomials of the form at? with a € K, 
are often written simply as q, 1.e., as elements of the field itself. This is justified since 
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K[t]<o and K are isomorphic K -vector spaces (of dimension 1). We already used 
this identification above. Similarly, we have identified the vector space V with V! and 
written just v instead of (v) in Sect. 9.3. Another common example in the literature 
is the notation K” that in our text denotes the set of n-tuples with elements from 
K , but which is often used for the (matrix) sets of the “column vectors” K™! or the 
“row vectors” K !”. The actual meaning then should be clear from the context. An 
attentive reader can significantly benefit from the simplifications due to such abuses 
of notation. 


10.2 Linear Maps and Matrices 


Let V and W be finite dimensional K -vector spaces with bases {v1, ..., Um} and 
{W1,..., Wn}, respectively, and let f € L(V, W). By Lemma 9.22, for every f (vj) € 
W, j =1,...,m, there exist (unique) coordinates aj; E€ K, i = 1,...,n, with 


TO Sawi +... Fan Wn 


We define A := [a;;] € K”™” and write, similarly to (9.3), the m equations for the 
vectors f (vj) as 


(f (vi), ..., fUm)) = Wises ay Wn)A. (10.1) 


The matrix A is determined uniquely by f and the given bases of Y and W. 
If v = Aivi +... +AmUm E V, then 


TUIS S Ai Fer FAntn) = Af OD) Fest Ams On) 


Àl 
= (f w1), ..., f(Um)) 
Xm 
Aj 
= ((W,..., Wn) A) 
Am 
rj 
= (W1,.-..,UW,) | A 
Am 


The coordinates of f (v) with respect to the given basis of W are therefore given by 
rj 

A: 
Xn 
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Thus, we can compute the coordinates of f (v) simply by multiplying the coordinates 
of v with A. This motivates the following definition. 


Definition 10.14 The uniquely determined matrix in (10.1) is called the matrix rep- 
resentation of f € L(V, W) with respect to the bases Bı = {vj,..., Um} of V and 
By = {w1,..., Wn} of W. We denote this matrix by [f]z,.2,. 


The construction of the matrix representation and Definition 10.14 can be consis- 
tently extended to the case that (at least) one of the K-vector spaces has dimension 
zero. If, for instance, m = dim(V) € N and W = {0}, then f(v;) = O for every 
basis vector v; of V. Thus, every vector f(v;) is an empty linear combination of 
vector of the basis Ø of W. The matrix representation of f then is an empty matrix 
of size 0 x m. If also V = {0}, then the matrix representation of f is an empty matrix 
of size 0 x 0. 

There are many different notations for the matrix representation of linear maps in 
the literature. The notation should reflect that the matrix depends on the linear map 
f and the given bases Bı and B2. Examples of alternative notations are [f ie and 
M(f)z,.B, (where “M” means “matrix’’). 

An important special case is obtained for Y = W, hence in particular m = n, and 
f = Idy, the identity on V. We then obtain 


(Uis sess Ca) = (Wise Wy laws (10.2) 


so that [Idy]g,, 2, is exactly the matrix P in (9.4), 1.e., the coordinate transformation 
matrix in Theorem 9.25. On the other hand, 


(w, oo 5 Wn) E (V1, s g Un) [Idy]z,.2,; 
and thus 
((Idyle,.2,) = Udy a,.2,- 


Example 10.15 


(1) Consider the vector space Q[t]<; with the bases B, = {1, t} and By = {t + 
1,¢ — 1}. Then the linear map 


J Qila Qtr, a Fao tr 2ait + a9, 
has the matrix representations 


Tig = 02/° LF lias = jf ees = i 


(2) For the vector space K[t]<, with the basis B = {t?, t',..., ¢”} and the linear 
map 


NI Nl 
NI] NIU 


NIO NI 
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J Klen — Klit l 
ant toat Pino Sed tao me al tar aatan ta 


we have f(t/) = t”-/ for j = 0, 1,...,n, so that 


1 
[flee = = aki. 
| 


Thus, [f ]g,g is a permutation matrix. 


Theorem 10.16 Let V and W be finite dimensional K -vector spaces with bases 
Bı = {vj,..., Um} and Bz = {w,..., Wn}, respectively. Then the map 


L(V, W) > K"", f> [f]e,, B, 


is an isomorphism. Hence L(V, W) = K”” and dim(L(V, W)) = dim(K”"") = 
n-m. 


Proof In this proof we denote the map f +> [f]z,.2, by mat, i.e., mat( f) = 
[ f1z,,8,- We first show that this map is linear. Let f, g € L(V, W), mat(f) = [fi;] 
and mat(g) = [gi;]. For j = 1,...,m we have 


(f + 9)(v;) = fw; +g) = Š fyw F XŠ gijwi = > fy + giz) Wi, 
= 


and thus mat(f + g) = [fi + gij] = Lfij] + [lgi] = mat(f) + mat(g). For A € K 
and j = 1,...,m we have 


ANE =Af |) =A, fyw = DLA wi. 
i=] 


i=l 


and thus mat (A f) = [A fu] = A [fy] = A mat( f). 

It remains to show that mat is bijective. If f € ker(mat), 1.e., mat( f) = 0 e K””, 
then f(v;) = O for j = 1,...,m. Thus, f(v) = O for all v € VY, so that f = 0 
(the zero map) and mat is injective (cp. (5) in Lemma 10.7). If, on the other hand, 
A = [a;j] € K™™” is arbitrary, we define the linear map f : V —> W via f(v;) := 
4 ajjWi, J = 1,...,m (cp. the proof of Theorem 10.4). Then mat( f) = A and 
hence mat is also surjective (cp. (4) in Lemma 10.7). 

Corollary 10.12 now shows that dim(L(V, W)) = dim(K™™) = n -m (cp. also 
Example 9.20). o 
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Theorem 10.16 shows, in particular, that f,g € L(V, W) satisfy f = g if and 
only if [f ]s,,B, = [g]g,,B, holds for given bases Bı of V and B2 of W. Thus, we can 
prove the equality of linear maps via the equality of their matrix representations. 

We now consider the map from the elements of a finite dimensional vector space 
to their coordinates with respect to a given basis. 


Lemma 10.17 If B = {v,,..., Un} is a basis of a K -vector space V, then the map 


Àl 
g : V —> K»! v= Mvt... +À e Pav) =|, 
An 


is an isomorphism, called the coordinate map of V with respect to the basis B. 


Proof The linearity of ®pg is clear. Moreover, we obviously have ®g (V) = K mA. 


i.e., Pz is surjective. If v € ker(® pg), i.e., Ay = --- = A, = 0, then v = 0, so that 
ker(® g) = {0} and ®z, is also injective (cp. (5) in Lemma 10.7). o 
Example 10.18 In the vector space K [t]<, with the basis B = {t?, t!,...,t”} we 
have 

QO 


n n—1| _ Ot n+1 
Op(a,t + a,_1t +...¢ajt+tao)=|.|eEkK™. 


On 


On the other hand, the basis B = {t”, r"~!,..., t?} yields 


Dz (ant” + C le +... + aıt + Qo) = 


0 


If Bı and B, are bases of the finite dimensional vector spaces V and W, respec- 
tively, then we can illustrate the meaning and the construction of the matrix repre- 
sentation [f]z,.2, of f € L(V, W) in the following commutative diagram: 


We see that different compositions of maps yield the same result. In particular, we 
have 
f= z o [f ]s,.B, © Pp,, (10.3) 
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where the matrix [ f]g,.3, € K™”™” is interpreted as a linear map from K™! to K™!, 
and we use that the coordinate map ® g, is bijective and hence invertible. In the same 
way we obtain 


g, o f = [f ]s,B, ° B, 
Los 
Pe (f0) = [f]e.s r w) forall v eV. (10.4) 


In words, the coordinates of f (v) with respect to the basis By of W are given by the 
product of [ f ]g,,g, and the coordinates of v with respect to the basis B, of V. 

We next show that the consecutive application of linear maps corresponds to the 
multiplication of their matrix representations. 


Theorem 10.19 Let V, W and X be K-vector spaces. If f € L(V, W) and g € 
LW, X), then go f € L(V, X). Moreover, if V, W and X are finite dimensional 
with respective bases Bı, By and B3, then 


[967 le 8. = leas lee 


Proof Leth := g o f. We show first that h € L(V, X). For u, v € Vand à, y € K 
we have 


h(Au + uv) = g(f Au + pv)) = gAf U) + uf (v)) 
= Ag(f(u)) + ug( f (v)) = Ah (u) + uh (v). 


Now let By = {v1,..., Unm}, Bo = {w1,..., Wn} and B3 = {x1,..., xs}. If 
[fls,.e, = lfi] and [g]s,.2, = [gi], then for j = 1,...,m we have 


h(vj) = gf 0) =9 (È fj n) 2 OD 
k=1 k=1 =l 


k=1 
— (> fum) = >( msi) Xj. 
p= \kel i=l \k=1 
— ama 
=: hij 
Thus, [A]z,,8, = [hij] = (gis) Lj] = [glee Lf BLB O 


Using this theorem we can study how a change of the bases affects the matrix 
representation of a linear map. 
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Corollary 10.20 Let V and W be finite dimensional K -vector spaces with bases 
Bı, Bı of V and Bo, By of W. If f € L(V, W), then 


[J ]lB,,B = Udw ls, pl f1z, ¢,.Udv lz, 5,- (10.5) 
In particular, the matrices | f |z,,8, and [f ]g, 3, are equivalent. 


Proof Applying Theorem 10.19 twice to the identity f = Idy o f o Idy yields 


[fls B =[Udw o f) o Idylz,,z, 
= [Idw o f]z,,B, Udvds, 3, 
= [Idw]z, g, [f13,.%, Hdvlpg,, ġ,- 


The matrices [ f ]s,,B, and [ f ]ġ, g, are equivalent, since both [Idw]&ġ, g, and [Idy]; 5, 
are invertible. oO 


If V = W, B, = Bo, and B, = Bp. then (10.5) becomes 
Fle. = Ody leo LF les Deve, lp a Uea raa 
Thus, the matrix representations [f]z,,3, and [f]g, 3, of the endomorphism f € 


L(V, V) are similar (cp. Definition 8.11). 
The following commutative diagram illustrates Corollary 10.20: 


Lf 1B), Bo 


m,l K™! (10.6) 
Dz, Dz, 
f 
[Idy] z; By yY — W Pe dw |g, Bo 
By B2 
Lf lB, By 


Km! sk, ee Km! 
Analogously to (10.3) we have 
f = 5) o [fles 0 Os, = P3. o [fa,,3, 0 Pz. 


Example 10.21 For the following bases of the vector space Q*”, 


e= too] [oo]: [ro]: [onl 
s= tlon] [oo]: fool: [rol 
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we have the coordinate transformation matrices 


0 00 1 
1—1 0—1 
0 01 O 
and 
1110 
_ 0010 
1000 
The coordinate maps are 
dii d2 
dj a a ai, a aii — 4p — a 
Dp, (| 11 2) = 12 ® p, (| 11 2) — 11 12 22 
d21 a22 d21 d21 422 di2 
ar? a2 


and one can easily verify that 


411 412 di1 a12 
D = ({Id D . 
Bo (i a) (Idy]z,,.3, 0 ®pg,) (|< °°) 


Theorem 10.22 Let V and W be K -vector spaces with dim(V) = m and dim(W) = 
n, respectively. Then there exist bases B, of V and By of W such that 


I, O n,m 
[Jls B = F olek" , 


where 0 < r = dim(im(f)) < min{n, m}. Furthermore, r = rank (F), where F is 
the matrix representation of f with respect to arbitrary bases of V and W, and we 
define rank( f) := rank(F) = dim(im( f )). 


Proof Let B= {v,,..., Um} and B = {w,,..., Wy} be two arbitrary bases of V 
and W, respectively. Let r := rank([f]g, 3,). Then by Theorem 5.11 there exist 
invertible matrices Q € K™” and Z e K™*" with 


I, 0 
Ol fl5,.#, Z = FE 4 (10.7) 
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where r = rank([f]z, 3,) < min{n, m}. Let us introduce two new bases By = 
{v;,...,Um} and Bo = {wy,..., Wn} of V and W via 


(v1, a g Um) = (v1, es ex Um) Zs 


(W1,..., Wn) := (D1, ..., Dn) Q 7, hence (Di, ..., Dn) = (W1, ..., Wn)Q. 
Then, by construction, 
Z= Udvylp, 5, Q = Udw 1, B,- 


From (10.7) and Corollary 10.20 we obtain 


k o = [Idw]z,, s, Lf 1z,3, Udvls,, ĝi = [f ]e,B - 


We thus have found bases B; and B2 that yield the desired matrix representation 
of f. Every other choice of bases leads, by Corollary 10.20, to an equivalent matrix 
which therefore also has rank r. It remains to show that r = dim (im( f )). 

The structure of the matrix [f ]g,,B, shows that 


Oy Layer 
i) = 
FW3) i r+l<j<m. 
Therefore, v;+1,..., Um E€ ker( f), which implies that dim(ker( f)) > m —r. On the 


other hand, w),..., w; E€ im( f) and thus dim(im(f)) > r. Theorem 10.9 yields 
dim(V) = m = dim(im(f)) + dim(ker(/)), 
and hence dim(ker(f)) = m — r and dim(im(f)) =r. o 


Example 10.23 For A € K™”™” and the corresponding map A € L(K™!, K”') from 
(1) in Examples 10.2 and 10.6, we have im(A) = span{a1, ..., am}. Thus, rank (A) 
is equal to the number of linearly independent columns of A. Since rank(A) = 
rank(A7) (cp. (4) in Theorem 5.11), this number is equal to the number of linearly 
independent rows of A. 


Theorem 10.22 is a first example of a general strategy that we will use several 
times in the following chapters: 

By choosing appropriate bases, the matrix representation should reveal a desired 
information about a linear map in an efficient way. 

In Theorem 10.22 this information is the rank of the linear map f, 1.e., the dimen- 
sion of its image. 

The dimension formula for linear maps can be generalized to the composition of 
maps as follows. 
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Theorem 10.24 f V, W and X are finite dimensional K-vector spaces, 
f €e L(V, W) and g € LIW, X), then 


dim(im(g o f)) = dim(im(f)) — dim(im( f) A ker(g)). 
Proof Let g := glimp) be the restriction of g to the image of f, i.e., the map 
g € L(üm(f), ¥), v> g(v). 
Applying Theorem 10.9 to g yields 


dim(im(f)) = dim(im(g)) + dim (ker (g)). 


Now 
im(g) = {g(v) € X |v € im(f)} = imo f) 
and 
ker(g) = {v € im(f) | g(v) = 0} = im( f) Nker(g), 
imply the assertion. D 


Note that Theorem 10.22 with V = W, f = Idy, and g € L(V, X) gives 
dim(im(g)) = dim(V) — dim (ker (g), which is equivalent to Theorem 10.9. 

If we interpret matrices A € K™” and B e K®™” as linear maps, then Theo- 
rem 10.24 implies the equation 


rank(B A) = rank (A) — dim (im (A) N ker(B)). 
For the special case K = R and B = A? we have the following result. 


Corollary 10.25 Jf A € R””, then rank(A! A) = rank(A). 


Proof Let w = [wi,..., Wn]! € im(A) N ker(A’). Then w = Ay for a vector 
y € R™ |. Multiplying this equation from the left by A’, and using that w € ker(A’), 
we obtain 0 = A’ w = A’ Ay, which implies 


n 
0 = y" AT Ay = wl w = 20. 
j=1 


Since this holds only for w = 0, we have im(A) N ker(A7) = {0}. o 
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Exercises 


(In the following exercises K is an arbitrary field.) 


201 
10.1 Consider the linear map on R°! given by the matrix A = | 210] € R°*?. 
411 
Determine ker(A), dim(ker(A)) and dim(im(A)). 
10.2 Construct a map f € L(V, W) such that for linearly independent vectors 


Vj,..., U, E V the images f(v), ..., f(v,) € W are linearly dependent. 
10.3 The map 


f : RIt]<n > R[t]<n-1, 


Bip? taral +. gt 0p nod EGS Daya #4 est tan 


is called the derivative of the polynomial p € R[t]<, with respect to the 
variable t. Show that f is linear and determine ker( f) and im( f). 
Í 0 0 1 0 
10.4 For the bases B; = 011,111,10 of R>! and B> = il | , | j 
0 1 
0 0 l 
of R>!, let f € L(R>!, R>!) have the matrix representation [f]g, 3, = 


0 23 
1—20 


2 1 —|] 
(a) Determine [f]z, 3, for the bases B; = 11,101, 2 of 
—] 3 l 


e ana B= (f1) J or 


(b) Determine the coordinates of f ([4, 1, 3]’) with respect to the basis B». 


10.5 Construct a map f € £(K[t], K[t]) with the following properties: 
D fpa) = (Ff (p)aq + p(f(4)) forall p,q € KIt]. 
(2) f@) =1. 


Is this map uniquely determined by these properties or are there further maps 
with the same properties? 
10.6 Leta e K and A € K™”. Show that the maps 


K[t]> K, pt pla), and K[t]> k"™™", p> p(A), 
are linear and justify the name evaluation homomorphism for this map. 


10.7 Let S € GL,,(K). Show that the map f : K"” > K”", Ate S7'AS is an 
isomorphism. 
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10.8 


10.9 


10.10 


10.11 


10.12 


Let K be a field with 1 + 1 Æ 0 and let A € K™”. Consider the map 
fiK SK, er An 


Is f a linear map? Show that f = 0 if and only if A + A’ = 0. 
Let V be a Q-vector space with the basis Bı = {v1,..., Vn} and let f € 
L(V, V) be defined by 


one nn j = l,...,n— 1, 
Vi + Vn, T= 

(a) Determine [ f]z,.2,. 

(b) Let B2 = {w1,..., Wn} with w; = jUn+i-j, J = 1,..., n. Show that 
B> is a basis of V. Determine the coordinate transformation matrices 
[Idy]s,, g, and [Idy]z, g,, as well as the matrix representations [f]z, 2, 
and [f ]s,, B2- 


Can you extend Theorem 10.19 consistently to the case W = {0}? What are 
the properties of the matrices [g o f]z,.2,, [g]B,,B, and [f ]B,, B? 
Consider the map 


Ff . Ritl<n —> Rif lertis 


1 
dd torg Oe op eS 0 


n+l 
l A l 3 
+ — Qn-ıt +... + -Qt + Qot. 
n 2 
(a) Show that f is linear. Determine ker( f) and im( f). 


(b) Choose bases B1, Bz in the two vector spaces and verify that for your 
choice rank ([f]s,, B) = dim(im(f)) holds. 


Leta, ..., Œn € R,n > 2,be pairwise distinct numbers and let n polynomials 
in R[t] be defined by 





0-09); JT =l 


Qk 


(a) Show that the set B ={p1, ..., pn} is a basis of R[t]<n—1. (This basis is 
called the Lagrange basis? of R[t]<n—1.) 
(b) Show that the corresponding coordinate map is given by 


? Joseph-Louis de Lagrange (1736-1813). 
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pa) 
®p : Rit]en-1 > R”!, pe 


P(Qn) 
(Hint: You can use Exercise 7.8 (b).) 


10.13 Verify different paths in the commutative diagram (10.6) for the vector spaces 
and bases of Example 10.21 and linear map f : Q7 > Q??, At FA with 


re] 11] 


Chapter 11 
Linear Forms and Bilinear Forms 


In this chapter we study different classes of maps between one or two K -vector spaces 
and the one dimensional K -vector space defined by the field K itself. These maps 
play an important role in many areas of Mathematics, including Analysis, Functional 
Analysis and the solution of differential equations. They will also be essential for 
the further developments in this book: Using bilinear and sesquilinear forms, which 
are introduced in this chapter, we will define and study Euclidean and unitary vector 
spaces in Chap. 12. Linear forms and dual spaces will be used in the existence proof 
of the Jordan canonical form in Chap. 16. 


11.1 Linear Forms and Dual Spaces 


We start with the set of linear maps from a K -vector space to the vector space K. 


Definition 11.1 If V is a K-vector space, then f € L(V, K) is called a linear form 
on V. The K-vector space V* := L(V, K) is called the dual space of V. 


A linear form is sometimes called a linear functional or a one-form, which stresses 
that it (linearly) maps into a one dimensional vector space. 


Example 11.2 If V is the IR-vector space of the continuous and real valued functions 
on the real interval [a, 3] and if y € [a, 6], then the two maps 


fi: V>R, ge gy), 


B 
DER VR, s> | g(x)dx, 


are linear forms on V. 


© Springer International Publishing Switzerland 2015 155 
J. Liesen and V. Mehrmann, Linear Algebra, Springer Undergraduate 
Mathematics Series, DOI 10.1007/978-3-3 19-24346-7_11 


156 11 Linear Forms and Bilinear Forms 


If dim(V) = n, then dim(V*) = n by Theorem 10.16. Let Bı = {v1,..., Vn} be 
a basis of VY and let By = {1} be a basis of the K-vector space K. If f e V*, then 
f(vi) = a; for some a; E€ K,i = 1,...,n, and 


[fls B, =[o1,.--,Qn] E K”. 


n 
For an element v = >| A;v; € V we have 
i=l 


n n n Àl 
FM=f BO) => AT) = 2 AF (Oise Ga) 
a ee 
ek"! 


= [f ]s,,B, Ba, v), 


where we have identified the isomorphic vector spaces K and K !! with each other. 
For a given basis of a finite dimensional vector space V we will now construct a 
special, uniquely determined basis of the dual space Y*. 


Theorem 11.3 Jf V is K-vector space with the basis B = {v1,..., Vn}, then there 
exists a unique basis B* = {ur eos oF of V* such that 


v; (v;) S i] = 1,..., N, 
which is called the dual basis of B. 


Proof By Theorem 10.4, a unique linear map from Y to K can be constructed by 


prescribing its images at the given basis B. Thus, for each i = 1, ..., n, there exists 
a unique map v; € L(Y, K) with v*(v;) = Oi, j = 1,...,N. 
It remains to show that B* := {vř,...,{vž} is a basis of V*. If Ay,...,An E€ K 


are such that 


Sat = Oy» = Vv: 


i=l 


then 
0=0y(¥j))= A OSA Jol,....n. 
=i 
Thus, v;,...,v, are linearly independent, and dim(V*) = n implies that B* is a 
basis of V* (cp. Exercise 9.6). o 


Example 11.4 Consider V = K™! with the canonical basis B = {e1,..., en}. If 
{ef nee e* is the dual basis of B, then e*(e;) = 0;;, which shows that [e* | = 


B,{1} 
er eK!" j=1,...,n. 
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Definition 11.5 Let V and W be K -vector spaces with their respective dual spaces 
Y* and W*, and let f € L(V, W). Then 


fT: Wa, he fh) :=hof, 
is called the dual map of f. 


We next derive some properties of the dual map. 


Lemma 11.6 Jf V, W and X are K-vector spaces, then the following assertions 
hold: 


(1) If f € L(V, W), then the dual map f* is linear, hence f* € LOW*, V*). 

(2) ff € LY, W) and g € LW, X), then (go f)* € L(X*, V*) and (go f)* = 
f* og". 

(3) If f € L(V, W) is bijective, then f* € L(W*, V*) is bijective and (f*)~! = 
Cm 


Proof (1) If hi, h2 E€ W*, Ay, A2 € K, then 


f? Arhi + Agh2) = (Arhi + Azha) 0 f = (Arhi) o f + Azha) 0 f 
= Ai (hı 0 f) + à2(h2 0 f) = à f” (hi) + Arf" (h2). 
O 
(2) and (3) are exercises. 
As the following theorem shows, the concepts of the dual map and the transposed 
matrix are closely related. 


Theorem 11.7 Let V and W be finite dimensional K -vector spaces with bases 
B, and Bz, respectively. Let Bj and Bž be the corresponding dual bases. If 
f € L(V, W), then 

[f"]s: s = (Lfls,B) - 


Proof Let Bi = {Viss Unh Be = {Wi Wh and let By = 0, es v*}, 
By 41) scent, | Let ia lee R18, 
n 
fy) = > ajwi, T&S lesa: 
i=1 


and [f*]s: g: = [bij] € K™”, i.e., 
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For every pair (k, £) with 1 < k < n and 1 < £ < m we then have 


ake = Š arw (wi) = we (> aws) = wš (f (ve)) = f* (wi) (ve) 


i=l i=l 

= (> bavi) (ve) = XŠ biv? (ve) 
i=] i=1 

= by, 


where we have used the definition of the dual map as well as w;(w;) = 0x; and 
v” (ve) = 0; l E 


Because of the close relationship between the transposed matrix and the dual map, 
some authors call the dual map f* the transpose of the linear map f. 

Applied to matrices, Lemma 11.6 and Theorem 11.7 yield the following rules 
known from Chap. 4: 


(AB)! = B! A! for A e K™” and B € K™*, and 
(Ay = (AT)! for A e GL,(K). 


Example 11.8 For the two bases of R*"', 


n=- e e-e- 


the elements of the corresponding dual bases are given by 
1 
v: R*! > R, H > a, +0, u R*! + R, bed b> 0+ —ao, 
Q2 2 2 
+ 2 a1 QO] * . 72,1 1 
wi : R — R, |e a-a, w3 : R — R, [o | oa 


The matrix representations of these maps are 


CAR = [1 0], carer = [01], 
w le = [1 o], [w]e = [0 1]. 


For the linear map 


f: R2! _5 R2! H = K SF j 


Q2 
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[fles = f a Lf lesa = E A | 


we have 


11.2 Bilmear Forms 


We now consider special maps from a pair of K -vector spaces to the K -vector space 
K. 


Definition 11.9 Let V and W be K -vector spaces. A map 3 : V x W — K is called 
a bilinear form on V x W, when 


(1) B+ v2, w) = (vi, w) + B2, w), 
(2) B(v, wy + w2) = (v, w1) + Ow, w2), 
(3) B(Av, w) = (v, Aw) = Ap (v, w), 


hold for all v, vj, v € V, w, w1, w2 E W, andà €K. 

A bilinear form £ is called non-degenerate in the first variable, if G(v, w) = 0 for 
all w € W implies that v = 0. Analogously, it is called non-degenerate in the second 
variable, if 3(v, w) = O for all v € V implies that w = 0. If 8 is non-degenerate 
in both variables, then ( is called non-degenerate and the spaces V, W are called a 
dual pair with respect to (3. 

If V = W, then 8 is called a bilinear form on V. If additionally G(v, w) = 
B(w, v) holds for all v, w € VY, then Ø is called symmetric. Otherwise, 8 is called 
nonsymmetric. 


Example 11.10 
(1) IfA € K”™”, then 


p: K™! x K! —> K, (v,w)m w? Av, 


is a bilinear form on K™! x K™! that is non-degenerate if and only if n = m 
and A € GL,(K), (cp. Exercise 11.10). 
(2) The bilinear form 


6:R'xR'SR, Gyrpy’ f | 
is degenerate in both variables: For x = [1, —1]’, we have G(x, y) = 0 for all 
y € R*!; for y = [1, —1]’ we have G(x, 9) = 0 for all x € R*!. The set of 
all x = [x,,x2]’ € R*! with G(x, x) = 1 is equal to the solution set of the 
quadratic equation in two variables A +2x1x2 + Xs = 1, or (x; + x2)? = 1, for 
X1, X2 € R. Geometrically, this set is given by the two straight lines x; +x. = 1 
and x; + x2 = —1 in the cartesian coordinate system of R?. 
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(3) If V is a K-vector space, then 
B:VxV>K, fpr fo), 
is a bilinear form on V x V*, since 


Port v2, f) = fi + v2) = fr) + flv.) = Bur, f) + Br, Pf), 
BV, fit fo =i t+ f)@) = fi(v) + folv) = BM, fi) + BO, fr), 
Biv, F) = fv) = Af 0) = ABO, f) = AN) = BQ, AP), 

hold for all v, vj, v2 € V, f, fi, fo € V* and A € K. This bilinear form is 


non-degenerate and thus V, V* are a dual pair with respect to O (cp. Exercise 
11.11 for the case dim(V) e N). 


Definition 11.11 Let V and W be K-vector spaces with bases Bı = {vj,..., Um} 
and Bz = {w),..., Wn}, respectively. If @ is a bilinear form on V x W, then 


iggs Slope k bet= pO), 


is called the matrix representation of 3 with respect to the bases Bı and Bo. 


If v = 1 Aju; € V and w = Ñ; Hiwi € W, then 


(v, w) = > > Agp wi) = > Hi > bn = (® p, (w))" [6]B; xB: Pp, V), 
1 


J=li=1 i= j=l 


where we have used the coordinate map from Lemma 10.17. 


Example 11.12 If By = Pe ..., e0} and By = few, ...,e/} are the canon- 
ical bases of K”:! and K™!, respectively, and if 3 is the bilinear form from (1) in 
Example 11.10 with A = [a;;] e Kk” then Felipe: = [bij], where 


by =e. eP) = (P) Ae = ay, 


and hence [G]z,.2, = A. 


The following result shows that symmetric bilinear forms have symmetric matrix 
representations. 


Lemma 11.13 For a bilinear form ( on a finite dimensional vector space V the 
following statements are equivalent: 


(1) B is symmetric. 
(2) For every basis B of V the matrix [O]gxpg is symmetric. 
(3) There exists a basis B of V such that [O] gxpg is symmetric. 
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Proof Exercise. o 


We will now analyze the effect of a basis change on the matrix representation of 
a bilinear form. 


Theorem 11.14 Let V and W be finite dimensional K -vector spaces with bases 
Bı, Bı of V and Bo, B of W. If p is a bilinear form on V x W, then 


[8]s,xe, = (Udwls, 3.) Ll3, x5, Udvlp, 5,- 


Proof Let B} = {v1,..., Um}, B, = 1 Vie seas Unih Bı = {w],..., Wn}, B» = 
{W1,..., Wn}, and 


(Vi, -es Um) = (U1,...,%m)P, where P = [pi] = [Idvlp, 5, 


(w1, Saag Wn) = (Wis soeg Wn)Q, where Q — [gi;] = Udw lp, 5- 


With [213,.%, = [b;;], where bi; = (vj, Wi), we then have 
B(vj, wi) = B( > pute. > auie) => qu >, Be, We) Pej 
k=1 é=1 ¢=1 k=1 
= Š di Š bck Pej 
é=1 k=1 


T 
dli Pij 
= : [0] : 3 
dni Pmj 


which implies that [8]g, xB, = OQT[6] B,xB,P, and hence the assertion follows. o 


If V = W and B,, Bo are two bases of V, then we obtain the following special 
case of Theorem 11.14: 


[Glz,xB, = (dvies) Cliar [dy]g,, B,- 


The two matrix representations [6]g, xg, and [9]z,..g, of 7 in this case are congruent, 
which we formally define as follows. 


Definition 11.15 Iffortwo matrices A, B e K™” there exists a matrix Z € GL, (K) 
with B = ZT AZ, then A and B are called congruent. 


Lemma 11.16 Congruence is an equivalence relation on the set K””. 


Proof Exercise. Oo 
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11.3 Sesquilinear Forms 


For complex vector spaces we introduce another special class of forms. 


Definition 11.17 Let Y and W be C-vector spaces. A map s : V x W — Cis called 
a sesquilinear form on V x W, when 


(1) swi + v2, w) = s (v1, w) + S(v2, w), 
(2) s(Av, w) = às (v, w), 
(3) s(v, wy + w2) = s(v, w1) + s (v, w2), 
(4) s(v, Aw) = às (v, w), 


hold for all v, vj, v2 E€ V, w, w1, w2 E W and à E C. 
If V = W, then s is called a sesquilinear form on V. If additionally s(v, w) = 
s(w, v) holds for all v, w € V, then s is called Hermitian.! 





The prefix sesqui is Latin and means “one and a half”. Note that a sesquilinear 
formis linear in the first variable and semilinear (“half linear”) in the second variable. 
The following result characterizes Hermitian sesquilinear forms. 


Lemma 11.18 A sesquilinear form on the C-vector space V is Hermitian if and only 
if s(v, v) € R forall v € V. 


Proof If s is Hermitian then, in particular, s(v, v) = s(v, v) for all v € V, and thus 
s(v,v) ER. 
If, on the other hand, v, w € VY, then by definition 


s(vutw,v+tw) =s(v, v) +s(v, w) + s(w, v) + s(w, w), (11.1) 
s(v +iw, v +iw) = s(v, v) + is(w, v) — Is(v, w) + s(w, w). (11.2) 


The first equation implies that s(v, w) + s(w,v) € R, since s(v + w,v + 
w), S(v, Vv), S(w, w) € R by assumption. The second equation implies analogously 
that is(w, v) — is(v, w) € R. Therefore, 





s(v, w) + s(w, v) = s (v, w) + s(w, v), 


—is(v, w) +is(w, v) = is (v, w) — is(w, v). 


Multiplying the second equation with i and adding the resulting equation to the first 
we obtain s(v, w) = s (w, v) o 





Corollary 11.19 For a sesquilinear form s on the C-vector space V we have 
2s(v, w) = s(v + w, v + w) +is(v +iw, v +iw)— A+ 1) (s(v, v) + s(w, w)). 


forall v, w E€ V. 


l Charles Hermite (1822-1901). 
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Proof The result follows from multiplication of (11.2) with 1 and adding the result 
to (11.1). o 


Corollary 11.19 shows that a sesquilinear form on a C-vector space V is uniquely 
determined by the values of s(v, v) for all v € V. 


Definition 11.20 The Hermitian transpose of A = [a;;] € C”” is the matrix 
A” := [ap] €C”". 
If A = A”, then A is called Hermitian. 


If a matrix A has real entries, then obviously A” = A’. Thus, a real symmetric 
matrix is also Hermitian. If A = [a;;] € C™” is Hermitian, then in particular aj; = qj; 
fori = 1,...,n,1.e., Hermitian matrices have real diagonal entries. 

The Hermitian transposition satisfies similar rules as the (usual) transposition 
(cp. Lemma 4.6). 


Lemma 11.21 For A, A € Œ”, B e C™t and À € C the following assertions 
hold: 


(1) T = A. pi 

(2) (A+ A)" = A” + A”. 

(3) AAE =XA®. 

(4) (AB)? = BË A#. 

Proof Exercise. o 


Example 11.22 For A € C™™ the map 
se Oaa w wAr, 


is a sesquilinear form. 


The matrix representation of a sesquilinear form is defined analogously to the 
matrix representation of bilinear forms (cp. Definition 11.11). 


Definition 11.23 Let V and W be C-vector spaces with bases By = {v1, ..., Um} 
and By = {w1, ..., Wn}, respectively. If s is a sesquilinear form on Y x W, then 


blan = Pe C; by =w); 


is called the matrix representation of s with respect to the bases B; and B3. 


Example 11.24 If Bı = few, ...,e” | and By = fe”, ..., e™ } are the canonical 
bases of C™! and C™!, respectively, and s is the sesquilinear form of Example 11.22 
with A = [aij] e C”:™ then [s]B, xB, = [bij] with 
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bij = s(e™, e = (e) Ae” — Ve =d 


i i 
and, hence, [s]g,xB, = A. 

Exercises 

(In the following exercises K is an arbitrary field.) 


11.1. Let Y be a finite dimensional K -vector space and v € VY. Show that f (v) = 0 
for all f € V* if and only if v = Q. 

11.2. Consider the basis B = {10, t — 1, t* — t} of the 3-dimensional vector space 
IR[t]<2. Compute the dual basis B* to B. 


11.3. Let V be an n-dimensional K-vector space and let {vj,..., vx} be a basis 
of V*. Prove or disprove: There exists a unique basis {v,,..., Vn} of V with 
u (v;) = On 


11.4. Let Y be a finite dimensional K -vector space and let f, g € V* with f 4 0. 
Show that g = Af fora A € K \ {0} holds if and only if ker( f) = ker(g). Is 
it possible to omit the assumption f 4 0? 

11.5. Let V be a K-vector space and let U be a subspace of V. The set 


U? := {f € V* | fu) = 0 for all u € U} 


is called the annihilator of U. Show the following assertions: 
(a) U° is a subspace of Y*. 
(b) For subspaces U1, Uh of V we have 
Ui +U =W NU, U NUD =W ++, 
and if U C h, then Uy C U. 
(c) If W is a K-vector space and f € L(V, W), then ker(f*) = (im(f))°. 


11.6. Prove Lemma 11.6 (2) and (3). 
11.7. Let V and W be K -vector spaces. Show that the set of all bilinear forms on 
VY x W with the operations 


+ : (Or FO) (v, w) := Diw, w) + Bo, w), 
: (A: B) (v, w) := A- B(v, w), 


is a K-vector space. 


11.8. Let V and W be K-vector spaces with bases {v1, ..., Um} and {w1, ..., Wn} 
and corresponding dual bases {vj,..., v} and {wř,..., wž}, respectively. 
Fori = 1,...,m and j = 1,...,n let 


bu: Y xW >K, (v,w) =e vi (v)w; (w). 


(a) Show that Gj; is a bilinear form on V x W. 
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11:9. 


LAU: 


11.11. 


1112: 


11:13: 


11.14. 
11.15. 
11.16. 


1p Pal 


S(v, w) = 


11.18. 


(b) Show that the set {8;; |i = 1,...,m, j = 1,...,n} is a basis of the 
K -vector space of bilinear forms on Y x W (cp. Exercise 11.7) and 
determine the dimension of this space. 

Let V be the R-vector space of the continuous and real valued functions on 

the real interval [a, 6]. Show that 


p 
B:VxV—-R, for | f (x)g(x)dx, 


is a symmetric bilinear form on V. Is 8 degenerate? 

Show that the map ( from (1) in Example 11.10 is a bilinear form, and show 
that it is non-degenerate if and only ifn = m and A € GL, (K). 

Let V be a finite dimensional K -vector space. Show that V, V* is a dual pair 
with respect to the bilinear form ( from (3) in Example 11.10, i.e., that the 
bilinear form ( is non-degenerate. 

Let V be a finite dimensional K -vector space and let U C V and W C Y* 
be subspaces with dim(Y/) = dim(W) > 1. Prove or disprove: The spaces 
U, W form a dual pair with respect to the bilinear form O : U x W-— K, 
(v, h) = h(v). 

Let Y and W be finite dimensional K-vector spaces with the bases B, and 
B2, respectively, and let 8 be a bilinear form on Y x W. 


(a) Show that the following statements are equivalent: 

(1) [@]e,xz, is not invertible. 

(2) 8 is degenerate in the second variable. 

(3) 8 is degenerate in the first variable. 
(b) Conclude from (a): @ is non-degenerate if and only if [8]s,xB, is 

invertible. 

Prove Lemma 11.16. 
Prove Lemma 11.13. 
For a bilinear form 6 on a K-vector space V, the map gg : V > K, 
vb» GB(v, v), is called the quadratic form induced by 8. Show the following 
assertion: 
If1+1 Æ 0in K and ĝ is symmetric, then G(v, w) = (qa(v +w)—@qg(v)— 
qg(w)) holds for all v, w € V. 
Show that a sesquilinear form s on a C-vector space V satisfies the polariza- 
tion identity 


(s(ut+w, v+tw)—s(v—w, v—w) +is(v+iw, v+iw) —is(v—iw, v—iw)) 


AI 


forallu,weé V. 
Consider the following maps from C>! x C>! to C: 


(a) Bix, y) = 3x1X1 + 3y1y, + x273 — x3, 
(b) Go(x, y) = x1Y2 + x293 + 391, 
(c) 83x, y) = x1y2 + x2y3 + X3y1, 
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11.19. 
11.20. 


11.21. 


11.22. 
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(d) Bax, y) = 3x19, + x1¥2 + x271 + 2ix273 — 2ixzy. + x3y3. 


Which of these are bilinear forms or sesquilinear forms on C°! ? Test whether 
the bilinear form is symmetric or the sesquilinear form is Hermitian, and 
derive the corresponding matrix representations with respect to the canonical 
basis Bı = {e1, e2, e3} and the basis By = fe), e} + iez, e2 + ie3}. 

Prove Lemma 11.21. 

Let A € C™” be Hermitian. Show that 


s : C1 x Ct, (v, w) w Ap, 


is a Hermitian sesquilinear form on C™!. 

Let V be a finite dimensional C-vector space with the basis B, and let s be 
a sesquilinear form on VY. Show that s is Hermitian if and only if [s]z,z 1s 
Hermitian. 

Show the following assertions for A, B € C””: 


(a) If A” = —A, then the eigenvalues of A are purely imaginary. 
(b) If A” = —A, then trace(A’) < 0 and (trace(A))* < 0. 
(c) If A” = A and B” = B, then trace((AB)*) < trace(A7B7). 


Chapter 12 
Euclidean and Unitary Vector Spaces 


In this chapter we study vector spaces over the fields IR and C. Using the definition of 
bilinear and sesquilinear forms, we introduce scalar products on such vector spaces. 
Scalar products allow the extension of well-known concepts from elementary geom- 
etry, such as length and angles, to abstract real and complex vector spaces. This, 
in particular, leads to the idea of orthogonality and to orthonormal bases of vector 
spaces. As an example for the importance of these concepts in many applications we 
study least-squares approximations. 


12.1 Scalar Products and Norms 


We start with the definition of a scalar product and the Euclidean or unitary vector 
spaces. 


Definition 12.1 Let V be a K-vector space, where either K = IR or K = C. A map 
(,-): VxVoK, (v,w)r wv, w), 


is called a scalar product on Y, when the following properties hold: 


(1) If K = R, then (., -) is a symmetric bilinear form. 
If K = C, then (-, -) is an Hermitian sesquilinear form. 

(2) (-, -) is positive definite, i.e., (v, v) > O holds for all v € Y, with equality if and 
only if v = 0. 


An R-vector space with a scalar product is called a Euclidean vector space! , and a 
C-vector space with a scalar product is called a unitary vector space. 


Scalar products are sometimes called inner products. Note that (v, v) is nonneg- 
ative and real also when VY is a C-vector space. It is easy to see that a subspace U of 


lEuclid of Alexandria (approx. 300 BC). 
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a Euclidean or unitary vector space V is again a Euclidean or unitary vector space, 
respectively, when the scalar product on the space V is restricted to the subspace U/. 


Example 12.2 
(1) A scalar product on R”™! is given by 


(v, w) := w? v. 


It is called the standard scalar product of R™'. 
(2) A scalar product on C”! is given by 


(0, WwW) =D Vv. 


It is called the standard scalar product of C™!. 
(3) For both K = Rand K = C, 


(A, B) := Spur(B” A) 


is a scalar product on K”’”. 
(4) A scalar product on the vector space of the continuous and real valued functions 
on the real interval [a, 6] is given by 


B 
ere / foe dda: 


We will now show how to use the Euclidean or unitary structure of a vector space 
in order to introduce geometric concepts such as the length of a vector or the angle 
between vectors. 

As a motivation of a general concept of length we have the absolute value of 
real numbers, i.e., the map | - | : R — R, x + |x|. This map has the following 
properties: 


(1) |Ax| = |A| - |x| for all A, x e€ R. 
(2) |x| > O for all x € R, with equality if and only if x = 0. 
(3) |x + y| < |x| + |y| for all x, y € R. 


These properties are generalized to real or complex vector spaces as follows. 


Definition 12.3 Let V be a K-vector space, where either K = R or K = C. A map 


lll: VR, ve lvl, 
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is called a norm on V, when for all v, w € V and A € K the following properties 
hold: 


(1) Avi] = Al: lull. 
(2) ||v|| > 0, with equality if and only if v = 0. 
(3) |v + wi] < |lv|] + ||w]| (triangle inequality). 


A K-vector space on which a norm is defined is called a normed space. 


Example 12.4 
(1) If (-, -) is the standard scalar product on R”!, then 


ull = (vv)? = ew)!” 


defines a norm that is called the Euclidean norm of R™'. 
(2) If (-, -) is the standard scalar product on C”!, then 


oll = (u,v)? = (vy? 


defines a norm that is called the Euclidean norm of C™!. (This is common 
terminology, although the space itself is unitary and not Euclidean.) 
(3) For both K = Rand K = C, 


|All = (trace(A™ A))'/" = (<> jay?) 


i=1 j=l 


is a norm on K”” that is called the Frobenius norm? of K". Form = 1 the 
Frobenius norm is equal to the Euclidean norm of K” l Moreover, the Frobenius 
norm of K”” is equal to the Euclidean norm of K””:! (or K””), if we identify 
these vector spaces via an isomorphism. 
Obviously, we have || All; = || Af |r = || AË |r for all A € K””. 

(4) If V is the vector space of the continuous and real valued functions on the real 
interval [a, 6], then 


b 1/2 
Als (AA? = ( J (f(x))*dz) 


is a norm on VY that is called the L?-norm. 
(5) Let K = Ror K = C, and let p € R, p > 1 be given. Then for 
v = [n, ..., Wm]! € K™! the p-norm of K™!t is defined by 


l= (> mil) (12.1) 
i=l 


Ferdinand Georg Frobenius (1849-1917). 
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(6) 
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For p = 2 this is the Euclidean norm on K”!. For this norm we typically omit 
the index 2 and write || - || instead of || - ||2 (as in (1) and (2) above). Taking the 


limit p — œ in (12.1), we obtain the oo-norm of K™!, given by 


IU lloo = max |v;l. 
l<i<n 


The following figures illustrate the unit circle in R>! with respect to the p-norm, 
i.e., the set of all v € R>! with ||v||, = 1, for p = 1, p = 2 and p = ow: 


V2 V2 V2 


L 1 11 1M 


p=l1 p=2 p = œ 


For K = Ror K = C the p-norm of K”” is defined by 





|| Aull, 
|All, := sup l 
vekigo [lUllp 


Here we use the p-norm of K”! in the denominator and the p-norm of K™! in 
the numerator. The notation sup means supremum, 1.e., the least upper bound 
that is known from Analysis. One can show that the supremum is attained by a 
vector v, and thus we may write max instead of sup in the definition above. 
In particular, for A = [a;;] € K™™ we have 


n 

All = max > eels 
Sia < n 
= 


m 
| Alloo = max > a | 
l<i<n * i 

J= 


These norms are called maximum column sum and maximum row sum norm 


of K™™, respectively. We easily see that || Alli = ||A’ ||. = ||A” Ilo. and 
WAlloo = IAF lh = || A£ ||ı. However, for the matrix 
| 1/2-1/4 2,2 
in E A i 


we have ||A||; = 1 and ||A||., = 7/6. Thus, this matrix A satisfies || A||1 < IA llo 
and ||A‘ llo < ||A’||,. The 2-norm of matrices will be considered further in 
Chap. 19. 
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The norms in the above examples (1)—(4) have the form ||v|| = (v, v)!/*, where 
(-, -) is a given scalar product. We will show now that the map v +> (v, v)!/? always 


defines a norm. Our proof is based on the following theorem. 


Theorem 12.5 Jf V is a Euclidean or unitary vector space with the scalar product 
(-, +), then 
Iv, w)? < (v, v): (w,w) forallv,w € V, (12.2) 


with equality if and only if v, w are linearly dependent. 


Proof The inequality is trivial for w = 0. Thus, let w Æ 0 and let 














A= an 
(w, w) 
Then 
0 < (v—Aw, v — Aw) = (v, v) — Aw, w) — à (w, v) — A(—A) (w, w) 
E (v, w) (v, w) |v, w)|? 
ia aa ae 
|v, w)|? 
— (v, v) ~~ (w, w) ’ 


which implies (12.2). 
If v, w are linearly dependent, then v = Aw for a scalar A, and hence 


Iv, w)? = (Aw, w)? = |A(w, w)? = JAF w, w)? = Ad (w, w) (w, w) 


= (Aw, Aw) (w, w) = (v, v) (w, w). 


On the other hand, let | (v, w)|? = (v, v)(w, w). If w = 0, then v, w are linearly 
dependent. If w Æ 0, then we define A as above and get 

2 
(v, w)P 4 


(v — Aw, v — Aw) = (v, v) — 
(w, w) 


Since the scalar product is positive definite, we have v — Aw = 0, and thus v, w are 
linearly dependent. oO 


The inequality (12.2) is called Cauchy-Schwarz inequality.’ It is an important 
tool in Analysis, in particular in the estimation of approximation and interpolation 
errors. 


3 Augustin Louis Cauchy (1789-1857) and Hermann Amandus Schwarz (1843-1921). 
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Corollary 12.6 If V is a Euclidean or unitary vector space with the scalar product 
(-,-), then the map 


l-l: VOR, ve |w] := w, v), 


is a norm on V that is called the norm induced by the scalar product. 


Proof We have to prove the three defining properties of the norm. Since (., -) is 
positive definite, we have ||v|| > 0, with equality if and only if v = 0. If v € V and 
A € K (where in the Euclidean case K = R and in the unitary case K = C), then 


Av||7 = (Av, Av) = AA (v, v) = [A| (v, v), 


and hence ||Av|| = |A| ||v||. In order to show the triangle inequality, we use the 
Cauchy-Schwarz inequality and the fact that Re(z) < |z| for every complex number 
z. For all v, w € VY we have 


lv + wl? = (vu + w, v + w) = (v, v) + (v, w) + (w, v) + (w, w) 
= (v, v) + (v, w) + (v, w) + (w, w) 
= |u|? + 2Re((v, w)) + lwl? 
< Jiv? +2 |w, w)| + lwll? 
< lloll? + 2llvli lwl + lwl? 


= (|lul| + lw)’, 


and thus ||v + w|| < lvli + lwll. o 


12.2 Orthogonality 


We will now use the scalar product to introduce angles between vectors. As motivation 
we consider the Euclidean vector space R>! with the standard scalar product and the 
induced Euclidean norm ||v|| = (v, v)}!. The Cauchy-Schwarz inequality shows 
that 

(v, w) 


< ——— <1 forallv, w € R>! \ {0}. 
lvl llw | 


If v, w € R>! \ {0}, then the angle between v and w is the uniquely determined real 
number y € [0, 7] with 


(v, w) 


COS = ~ 
A = owi 
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The vectors v, w are orthogonal if y = 7/2, so that cos(y~) = 0. Thus, v, w are 
orthogonal if and only if (v, w) = 0. 


An elementary calculation now leads to the cosine theorem for triangles: 


lv — w|? = (v — w, v — w) = (v, v) — 2 (v, w) + (w, w) 


= jiv]? + lwl? — 2llvli lwl] cos(y). 


If v, w are orthogonal, i.e., (v, w) = 0, then the cosine theorem implies the 
Pythagorean theorem’: 


2 2 
lvl + lwll. 


2 
lv — wl 


The following figures illustrate the cosine theorem and the Pythagorean theorem for 
vectors in R*!: 






lvl- lwl cos(y) 


In the following definition we generalize the ideas of angles and orthogonality. 
Definition 12.7 Let V be a Euclidean or unitary vector space with the scalar product 
(: ’ :) ° 


(1) In the Euclidean case, the angle between two vectors v, w € V \ {0} is the 
uniquely determined real number y € [0, 7] with 


(v, w) 


COS Z 
= Tol iw 


(2) Two vectors v, w € VY are called orthogonal, if (v, w) = 0. 
(3) A basis {vj,..., Vn} of V is called an orthogonal basis, if 


Wew S0 1.7 S= e amd tS 7, 


If, furthermore, 

lull =1, i=1,...,n, 
where ||v|| = (v,v)!/* is the norm induced by the scalar product, then 
{v1}, ..., Vn} is called an orthonormal basis of V. (For an orthonormal basis 
we therefore have (v;, vj) = 0;;.) 


+Pythagoras of Samos (approx. 570-500 BC). 
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Note that the terms in (1)—(3) are defined with respect to the given scalar product. 
Different scalar products yield different angles between vectors. In particular, the 
orthogonality of two given vectors may be lost when we consider a different scalar 
product. 


Example 12.8 The standard basis vectors e1, e2 € R>! are orthogonal and {e;, e2} 
is an orthonormal basis of R>! with respect to the standard scalar product (cp. (1) in 
Example 12.2). Consider the symmetric and invertible matrix 


{21 2.2 
a=] ER À 


which defines a symmetric and non-degenerate bilinear form on R>! by 
(v, w) œ> w? Av 


(cp. (1) in Example 11.10). This bilinear form is positive definite, since for all v = 
[v1, n]! € R>! we have 


v? Av = v? + TA i En). 


The bilinear form therefore is a scalar product on R*!, which we denote by (-, -) 4. 
We denote the induced norm by || - || 4. 
With respect to the scalar product (-, -) 4 the vectors e1, e2 satisfy 


(€1,€1)4 =e; Ae =2, (€2, €2)4 =el Ae, = 2, (ei, €2)4 = ef Ae; = 1. 


Clearly, {e;, e2} 1s not an orthonormal basis of R>! with respect to (-, -)4. Also note 
that |lei||4 = llezlla = V2. 
On the other hand, the vectors vı = [1, 1]’ and v. = [—1, 1] satisfy 


(vi, U1)4 =v; Avy = 6, (U2, v2)4 = v An = 2, (vi, V2)4 = v Av = 0. 


Hence ||v; lla = v6 and ||v||_4 = V2, so that {6—!/2v,, 27!/2v2} is an orthonormal 
basis of R! with respect to the scalar product (-, -) 4 


We now show that every finite dimensional Euclidean or unitary vector space has 
an orthonormal basis. 


Theorem 12.9 Let V be a Euclidean or unitary vector space with the basis 
{v,,..., Un}. Then there exists an orthonormal basis {u,,..., Un} of V with 


SPAM isc cay Ug} = span{vi,..., Vil, k=1,...,n. 
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Proof We give the proof by induction on dim(V) = n. If n = 1, then we set u; := 


vil tvi. Then ||; || = 1, and {u1} is an orthonormal basis of V with span{u;} = 
span{v,}. 

Let the assertion hold for ann > 1. Let dim(V) = n + 1 and let {vj,..., Un+1} 
be a basis of V. Then V, := span{v;, ..., Un} is an n-dimensional subspace of V. By 
the induction hypothesis there exists an orthonormal basis {u1, ..., Un} of V, with 
span{u;,..., Ug} = span{v,,..., vg} fork = 1,...,n. We define 

n 
eae -= Vn — > (ns, Uk)Uk, Un+1 i= [anl En. 
k=1 
Since v,4; ¢ V, = span{u;,...,u,}, we must have u,,, Æ 0, and Lemma 9.16 
yields span{u1, ..., Un+1} = span{v,,..., Vary}. 
For j = 1,...,n we have 


(Unti Uj) = (ln ll Bati, uj) 
n 
= [anl ( C ia D t Ux) (Uk, u) 
k=1 
_ 1 = 
== |[un=+1 |l ((Un41, uj) > (Un+15 uj)) 


= 0. 


Finally, (Un41,Un+i) = [n41 (p41. @n41) = 1 which completes the proof. o 


The proof of Theorem 12.9 shows how a given basis {v;,..., v,} can be ortho- 
normalized, i.e., transformed into an orthonormal basis {u,,..., un} with 
span{ui,..., Ug} = Spal Vig aces Ue) k=1,...,N. 


The resulting algorithm is called the Gram-Schmidt method’: 


Algorithm 12.10 Given a basis {v),..., Vn} of V. 


(1) Set u; = Ivii ey. 
(2) For j =1,...,n—1 set 
J 
Ui = Vj+1 — > vj, Ux )UK, 
k=1 


. am —|on 
Uji t= ujal] Uj. 


5 Jørgen Pedersen Gram (1850-1916) and Erhard Schmidt (1876-1959). 
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A slight reordering and combination of steps in the Gram-Schmidt method yields 


[vill (v2, U1)... (Un, U1) 
u2 || 
(V1, U2, ---5 Un) = (U1, U2,..., Un) 
et mamal n IMi - 
eyn eyn ° (Dig Up) 
un || 


The upper triangular matrix on the right hand side is the coordinate transformation 
matrix from the basis {v1, ..., v,} to the basis {u1, ..., un} of V (cp. Theorem 9.25 
or 10.2). Thus, we have shown the following result. 


Theorem 12.11 /fV is a finite dimensional Euclidean or unitary vector space witha 
given basis B4, then the Gram-Schmidt method applied to B; yields an orthonormal 
basis By of V, such that [Idy]g, B, is an invertible upper triangular matrix. 


Consider an m-dimensional subspace of R”! or C”! with the standard scalar 
product (-, -), and write the m vectors of an orthonormal basis {g1, . . . , qm} as columns 
of a matrix, Q := [q1, ..., qm]. Then we obtain in the real case 


O° O = [q; qj] = (aj. qi) = [ji] = In. 
and analogously in the complex case 
Q” O = [q qi] = Kaj. qi)) = lji] = In. 


If, on the other hand, QT Q = I„ or O” O = I, for a matrix Q € R”” or Q e Œ”, 
respectively, then the m columns of Q form an orthonormal basis (with respect to the 
standard scalar product) of an m-dimensional subspace of R”! or C™!, respectively. 
A “matrix version” of Theorem 12.11 can therefore be formulated as follows. 


Corollary 12.12 Let K = Ror K = Cand let vy, ..., vm € K™! be linearly inde- 
pendent. Then there exists a matrix Q € K™” with its m columns being orthonormal 
with respect to the standard scalar product of K™', i.e., QTQ = In for K = Ror 
Q" Q = I„ for K =C, and an upper triangular matrix R € GL,,(K), such that 


[Vipcess Um | = QR. (12.3) 


The factorization (12.3) is called a Q R-decomposition of the matrix [v,,..., Um]. 
The QR-decomposition has many applications in Numerical Mathematics (cp. 
Example 12.16 below). 


Lemma 12.13 Let K = Ror K = Cand let Q € K”” be a matrix with orthonor- 
mal columns with respect to the standard scalar product of K"!. Then |\v|| = || Qv|| 
holds for all v e K™!. (Here || - || is the Euclidean norm of K™! and of K™'.) 
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Proof For K = C we have 


lvl? = (v, v) = vty = v” (Q" O)u = (Qu, Qv) = ||Qull’, 
and the proof for K = R is analogous. Oo 
We now introduce two important classes of matrices. 
Definition 12.14 


(1) A matrix Q € R”” whose columns form an orthonormal basis with respect to 
the standard scalar product of R”! is called orthogonal. 
p 8 
(2) A matrix Q € C™” whose columns form an orthonormal basis with respect to 
the standard scalar product of C”! is called unitary. 


A matrix Q = [q,..., Gn] E IR”” is therefore orthogonal if and only if 
O° OQ = (4; al = (qj, 4)] = [ji] = hn. 


In particular, an orthogonal matrix Q is invertible with Q7! = Qf” (cp. Corol- 
lary 7.20). The equation QQ’ = I, means that the n rows of Q form an orthonormal 
basis of R!” (with respect to the scalar product (v, w) := wv’). 

Analogously, a unitary matrix Q e C”” is invertible with O~' = OQ” and 
Q” Q = I, = OQ". Then columns of Q form an orthonormal basis of Ct”. 


Lemma 12.15 The sets O(n) of orthogonal and U(n) of unitary n x n matrices form 
subgroups of GL, (IR) and GL, (C), respectively. 


Proof We consider only O(n); the proof for U (n) is analogous. 

Since every orthogonal matrix is invertible, we have that O(n) C GL, (R). The 
identity matrix J, is orthogonal, and hence J, € O(n) Æ Ø. If Q € O(n), then also 
QT = Q~! € On), since (Q')' QT = OO! = 1,. Finally, if Q1, Q2 € O(n), 
then 


(Q1Q>)" (Q102) = Q0? (0! Q1)Q2 = Q Q2 = h, 


and thus Qı Q2 € O(n). o 
Example 12.16 In many applications measurements or samples lead to a data set 
that is represented by tuples (7;, w;) € R?, i = 1,...,m. Here |] < -+> < Tm, 
are the pairwise distinct measurement points and 41, ..., Hm are the corresponding 
measurements. In order to approximate the given data set by a simple model, one can 
try to construct a polynomial p of small degree so that the values p(71),..., P(Tm) 
are as close as possible to the measurements u1, ..., Um- 


The simplest case is a real polynomial of degree (at most) 1. Geometrically, this 
corresponds to the construction of a straight line in R? that has a minimal distance 
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to the given points, as shown in the figure below (cp. Sect. 1.4). There are many 
possibilities to measure the distance. In the following we will describe one of them 
in more detail and use the Gram-Schmidt method, or the Q R-decomposition, for the 
construction of the straight line. In Statistics this method is called linear regression. 


p(t) e-~ 





Ti T2 T3 T4 T5 1U Tm 


A real polynomial of degree (at most) 1 has the form p = at + ( and we are 
looking for coefficients a, 3 € R with 


DG) SOG TO A iy. TS ly eg. 


Using matrices we can write this problem as 


Q , Q 
HG : or [v], v2] HRE 


Tw jin 


As mentioned above, there are different possibilities for interpreting the symbol “~”. 
In particular, there are different norms in which we can measure the distance between 
the given values 41, ..., Hm and the polynomial values p(71),..., P(Tm). Here we 
will use the Euclidean norm || - || and consider the minimization problem 


[v1, v2] Hi = yl ; 


The vectors vı, v2 € R™! are linearly independent, since the entries of vı are 
pairwise distinct, while all entries of v2 are equal. Let 


min 
a, GER 








[vj, v2] = [q1, q2]R 


be a Q R-decomposition. We extend the vectors q1, q2 € R”! to an orthonormal 
basis {q1, q2, q3, . - - , qm} of R™!. Then Q = [q1, . .-, qm] E€ R™” is an orthogonal 
matrix and 
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[vi, v2] A = |= m sl [q1 q2]R H = | 


fot) [l= 


min 
a, GER 























a GeR 
R T 
= mi, 12 (loa al- 
a di? 
R T 
P do Y 
= min 0 — q3 Y 
a, BER 
0 
qY 


Here we have used that OQ? = Im and ||Qv]| = ||v|| for all v € R”!. The upper 
triangular matrix R is invertible and thus the minimization problem is solved by 


l (a4) 

|= RR ra [i 

p qo y 

Using the definition of the Euclidean norm, we can write the minimizing property 
of the polynomial p := at + 8 as 


[v1, v2] A =y 


2 


= > (P) — pi)? 
min n (Leon +A) m? ). 


Since the polynomial p minimizes the sum of squares of the distances between the 
measurements u; and the polynomial values p(7;), this polynomial yields a least 
squares approximation of the measurement values. 

Consider the example from Sect. 1.4. In the four quarters of a year, a company has 
profits of 10, 8, 9, 11 million Euros. Under the assumption that the profits grows 
linearly, 1.e., like a straight line, the goal is to estimate the profit in the last quarter 
of the following year. The given data leads to the approximation problem 











11 10 


2ljjaļ _ |8 oe onm aj 
4 1 11 
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The numerical computation of a Q R-decomposition of [v;, v2] yields 


~r 


10 
~ =i l 2 3 4 
z E oe A F 75 73 A | s| ped 


1 a 
p 0 4v6 Ten 2 8.5 
a ee 
=R =[91.92]" 


and the resulting profit estimate for the last quarter of the following year is p(8) = 
11.7, 1.e., 11.7 million Euros. 


MATLAB-Minute. 

In Example 12.16 one could imagine that the profit grows quadratically instead 
of linearly. Determine, analogously to the procedure in Example 12.16, a poly- 
nomial p = at? + Gt + Ñ that solves the least squares problem 


4 4 
>. (PT) — m) = pnn (> (Qr + Bri +y) — n?) 


Use the MATLAB command qr for computing a Q R-decomposition, and 
determine the estimated profit in the last quarter of the following year. 


We will now analyze the properties of orthonormal bases in more detail. 


Lemma 12.17 Jf V is a Euclidean or unitary vector space with the scalar product 
(-, -) and the orthonormal basis {u,,...,U,}, then 


n 
v= bs Uj) Uj 
= 


forallv € V. 


Proof For every v € V there exist uniquely determined coordinates A;,..., A, with 
v= J jAi Forevery j = 1,.:.,n we then have (vu) = J gA int) = 
Aj. O 


The coordinates (v, u;), i = 1,...,n, of v with respect to an orthonormal basis 
{u1, ... , Un} are often called the Fourier coefficients? of v with respect to this basis. 
The representation v = $ ;_; (v, u;)u; is called the (abstract) Fourier expansion of 
v in the given orthonormal basis. 


6Jean Baptiste Joseph Fourier (1768-1830). 
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Corollary 12.18 Jf V is a Euclidean or unitary vector space with the scalar product 

(-, -) and the orthonormal basis {u,, ..., un}, then the following assertions hold: 

(1) (v, w) = X \;_; (v, ui) (ui, w) = °F, (v, ui) (w, ui) forall v, w € V (Parseval’s 
identity’ ). 

(2) AD), = X a |v, u;)|* forall v € V (Bessel’s identity? ). 

Proof 


(1) We have v = $ ;_; (v, u;)u;, and thus 





(v, w) = ( J (v, uiui, w) = Sv, ai) (ui, w) = D (w, ui) qw, u). 


i=l i=l i=l 
(2) is a special case of (1) for v = w. o 


By Bessel’s identity, every vector v € Y satisfies 


n 
2 2 2 
lvl? = (v, v) = X` tv, ui)? > max |(v, ui)l?, 
i=l 


Į<i<n 


where || - || is the norm induced by the scalar product. The absolute value of each 
coordinate of v with respect to an orthonormal basis of VY is therefore bounded by 
the norm of v. This property does not hold for a general basis of V. 


Example 12.19 Consider V = R>! with the standard scalar product and the Euclid- 
ean norm, then for every real e Æ 0 the set 


lol i 


is a basis of V. For every vector v = [1, v]” we then have 


s= (4-2) [a] + 2 al 


If |v], |2| are moderate numbers and if |e| is (very) small, then |v; — 12/e| and 
|v2/e| are (very) large. In numerical algorithms such a situation can lead to significant 
problems (e.g. due to roundoff errors) that are avoided when orthonormal bases are 
used. 


7™Marc-Antoine Parseval (1755-1836). 
8Friedrich Wilhelm Bessel (1784-1846). 


182 12 Euclidean and Unitary Vector Spaces 


Definition 12.20 Let V bea Euclidean or unitary vector space with the scalar product 
(-,-), and let U C V be a subspace. Then 


Ut := {v e V| (v, u) =0 forallu € U} 


is called the orthogonal complement of U (in V). 
Lemma 12.21 The orthogonal complement U+ is a subspace of V. 
Proof Exercise. Oo 


Lemma 12.22 Jf V is an n-dimensional Euclidean or unitary vector space, and if 
U CV is an m-dimensional subspace, then dim(U+) = n —mandV=U@U-+. 


Proof We know that m < n (cp. Lemma 9.27). If m = n, then U = V, and thus 
U+=Vr={vEV|(v,u) =0 forall u € V} = {0}, 


so that the assertion is trivial. 

Thus letm < n and let {u,,..., Um} be an orthonormal basis of U. We extend 
this basis to a basis of Y and apply the Gram-Schmidt method in order to obtain an 
orthonormal basis {u1, ... , Um, Um+1, - - - , Un} of V. Then span {um+1,.-., Un} C Ut 
and therefore V = U + U+. If w € U N U+, then (w, w) = 0, and hence w = 0, 
since the scalar product is positive definite. Thus, U N U + = {0}, which implies that 
YV = U p Ut and dim(U+) = n — m (cp. Theorem 9.29). In particular, we have 
Ut = span{um+1, ... , Un}. o 


12.3 The Vector Product in R>! 


In this section we consider a further product on the vector space R?! that is frequently 
used in Physics and Electrical Engineering. 


Definition 12.23 The vector product or cross product in R>! is the map 
R? xR?! —> R? t, (v, w) œ> vxw := [w3 — 3w, V3W] — V1wW3, VIW? — vow]! , 


where v = [v], n, n]! and w = [w], wo, w3]. 


In contrast to the scalar product, the vector product of two elements of the vector 
space R°! is not a scalar but again a vector in R®!. Using the canonical basis vectors 
of R>*!, 

-_ T — T — SÄ 
eı = [1,0,0], e2 = [0,1,0], e3 = [0,0,1], 
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we can write the vector product as 


v x w = det (| 2J) e; — det H A) e> + det E A) e3 
V3 W3 V3 W3 V2 W2 


Lemma 12.24 The vector product is linear in both components and for all v, w € 
R>! the following properties hold: 


(1) v x w = —w x v, i.e., the vector product is anti commutative or alternating. 
(2) Iv x wl? = lvi? lwl? — w, w)?, where (-, -) is the standard scalar product 
and || - || the Euclidean norm of R>"'. 


(3) (w, v x w) = (w, v x w) = 0, where (-, -) is the standard scalar product of Ròt. 
Proof Exercise. Oo 


By (2) and the Cauchy-Schwarz inequality (12.2), it follows that v x w = 0 holds 
if and only if v, w are linearly dependent. From (3) we obtain 


(Av + uw, v x w) = Alv, v x w) + ulw, v x w) = 0, 


for arbitrary A, u € R. If v, w are linearly independent, then the product v x w is 
orthogonal to the plane through the origin spanned by v and w in R*!, i.e., 


vxw E€ {\v + pw] dA, u ER 


Geometrically, there are two possibilities: 


“right-hand rule” V XW 


The positions of the three vectors v, w, v x w on the left side of this figure correspond 

to the “right-handed orientation” of the usual coordinate system of R®!, where the 

canonical basis vectors e;, €2, e3 are associated with thumb, index finger and middle 

finger of the right hand. This motivates the name right-hand rule. In order to explain 

this in detail, one needs to introduce the concept of orientation, which we omit here. 
If y € [0, 7] is the angle between the vectors v, w, then 


(v, w) = [lv] lwl] cos(y) 
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(cp. Definition 12.7) and we can write (2) in Lemma 12.24 as 


so that 


2 2 2 2 2 2 2 2 cae 
lv x wil = lull? lwli — lull? lwli cos 6) = full wll sin’), 


lv x wl] = [lvl] wl] sinc). 


A geometric interpretation of this equation is the following: The norm of the vector 
product of v and w is equal to the area of the parallelogram spanned by v and w. 
This interpretation is illustrated in the following figure: 


V 


Exercises 


12.1 


12.2 


12 


12.4 


125 


12.6 


12:7 


12.8 


Let V be a finite dimensional real or complex vector space. Show that there 
exists a scalar product on V. 

Show that the maps defined in Example 12.2 are scalar products on the cor- 
responding vector spaces. 

Let (-, -) be an arbitrary scalar product on R”!. Show that there exists a matrix 
A € R”” with (v, w) = wf Av forall v, w € R”!. 

Let Y be a finite dimensional R- or C-vector space. Let sı and sz be scalar 
products on Y with the following property: If v, w € Y satisfy sı (v, w) = 0, 
then also s2(v, w) = 0. Prove or disprove: There exists a real scalar A > 0 
with sı(v, w) = Aso(v, w) for all v, w € V. 

Show that the maps defined in Example 12.4 are norms on the corresponding 
vector spaces. 

Show that 


n m 

lAl] = max > lajj| and ||Al|oo = max > Jai; | 
I<jsm 4 I<isn 4 
(= J= 


for all A = [a;;] E K””, where K = R or K = C (cp. (6) in Example 12.4). 
Sketch for the matrix A from (6) in Example 12.4 and p € {1, 2, oo}, the sets 
{Av|v € R?!, lolly =1} Cc Rat 

Let VY be a Euclidean or unitary vector space and let || - || be the norm induced 
by a scalar product on V. Show that || - || satisfies the parallelogram identity 


lv + wl? + lv — wl? = 2(lull* + wll?) 


forallu,weé V. 
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12.9 


12.10 


2A 


12.12 


12.13 


12.14 


12.15 


12:16 


12.17 


12.18 


Let V be a K-vector space (K = R or K = C) with the scalar product (-, -) 
and the induced norm || - ||. Show that v, w € V are orthogonal with respect 
to (-, -) if and only if ||v + Aw|| = |v — Aw]| for all A € K. 

Does there exist a scalar product (-, -) on C”!, such that the 1-norm of C”! 
(cp. (5) in Example 12.4) is the induced norm by this scalar product? 

Show that the inequality 


(Eea = Sova? EÊ 
i=1 i=l 1 : 


i= 


holds for arbitrary real numbers a1, ..., Œn, 61, ..., On and positive real num- 
Ders Yis errs Yas 

Let V be a finite dimensional Euclidean or unitary vector space with the scalar 
product (-,-). Let f : V —> V be a map with (f(v), f(w)) = (v, w) for all 
v, w € VY. Show that f is an isomorphism. 

Let V be a unitary vector space and suppose that f € L(V, V) satisfies 
(f(v), v) = 0 for all v € V. Prove or disprove that f = 0. 

Does the same statement also hold for Euclidean vector spaces? 

Let D = diag(dı, ..., dp) € R” with d,,...,d, > 0. Show that (v, w) = 
wT Dvis a scalar product on R™!. Analyze which properties of a scalar product 
are violated if at least one of the d; is zero, or when all d; are nonzero but have 
different signs. 

Orthonormalize the following basis of the vector space C? with respect to 
the scalar product (A, B) = trace(B” A): 


10 10 11 11 
OO} (OL? JOI P ter) 
Let Q € R”™” be an orthogonal or let Q € C”” be a unitary matrix. What are 


the possible values of det(Q)? 
Let u € R»! \ {0} and let 


1 
H(u)= 1, —2 —— uu" e R”. 
U`- Uu 


Show that the n columns of H (u) form an orthonormal basis of R”! with 
respect to the standard scalar product. (Matrices of this form are called House- 
holder matrices.? We will study them in more detail in Example 18.15.) 
Prove Lemma 12.21. 


? Alston Scott Householder (1904—1993), pioneer of Numerical Linear Algebra. 
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12.19 Let 


I ot 
/2 ya 

[v1, v2, v3] = -30 ER”. 
0 00 


Analyze whether the vectors vj, v2, v3 are orthonormal with respect to the stan- 
dard scalar product and compute the orthogonal complement of span{v1, v2, v3}. 

12.20 Let V be a Euclidean or unitary vector space with the scalar product (-, -), let 
Uj,...,Uz € Vand let U = span{u,,..., uz}. Show that for v € V we have 
v € U+ if and only if (v, uj} =O for j =1,...,k. 

12.21 In the unitary vector space C*! with the standard scalar product let vı = 
[—1, i, 0, 1]’ and v = [i, 0, 2, 0] be given. Determine an orthonormal 
basis of span{v,, v2}+. 

12.22 Prove Lemma 12.24. 


Chapter 13 
Adjoints of Linear Maps 


In this chapter we introduce adjoints of linear maps. In some sense these represent 
generalizations of the (Hermitian) transposes of a matrices. A matrix is symmetric 
(or Hermitian) if it 1s equal to its (Hermitian) transpose. In an analogous way, an 
endomorphism is selfadjoint if it is equal to its adjoint endomorphism. The sets of 
symmetric (or Hermitian) matrices and of selfadjoint endomorphisms form certain 
vector spaces which will play a key role in our proof of the Fundamental Theorem of 
Algebra in Chap. 15. Special properties of selfadjoint endomorphisms will be studied 
in Chap. 18. 


13.1 Basic Definitions and Properties 


In Chap. 12 we have considered Euclidean and unitary vector spaces, and hence 
vector spaces over the fields R and C. Now let V and W be vector spaces over a 
general field K, and let G be a bilinear form on V x W. 

For every fixed vector v € V, the map 


B : W > K, w |> p(v, w), 
is a linear form on W. Thus, we can assign to every v € V a vector B, € W*, which 
defines the map 
BO : V> W*, ve By. (13.1) 
Analogously, we define the map 


Dri Wr. wee By. (13.2) 
where Ow : V —> K is defined by v > 8(v, w) for every w € W. 
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Lemma 13.1 The maps B® and B®” defined in (13.1) and (13.2), respectively, are 
linear, i.e., BY € L(V, W*) and B® © LOW, V*). If dim(V) = dim(W) € Nand 
B is non-degenerate (cp. Definition 11.9), then 8“ and B®” are bijective and thus 
isomorphisms. 


Proof We prove the assertion only for the map 3“); the proof for 3 is analogous. 
We first show the linearity. Let v1, v2 € V and A1, A2 € K. For every w € W we 
then have 


BY (Aivi + Azv2)(w) = BOY + A2v2, w) 
= Ab (v1, w) + A2B(v2, w) 
= 18 (v) (w) + A28” (v2) (w) 
= (A18 w1) + A28” (v2) w), 
and hence 3 (Aivi +A202) = A,B (v1) +A2B™ (v2). Therefore, 6P € L(V, W°). 
Let now dim (V) = dim(W) € Nand let 3 be non-degenerate. We show that 3") € 


L(V, W*) is injective. By (5) in Lemma 10.7, this holds if and only if ker(8®) = {0}. 
If v € ker(G“), then 8® (v) = B, = 0 € W*, and thus 


B(w) = B(v, w) = 0 forall w € W. 


Since 8 is non-degenerate, we have v = 0. Finally, dim(V) = dim(W) and dim(W) 
= dim(W*) imply thatdim(V) = dim(W*)so that 6 is bijective (cp. Corol- 
lary 10.11). oO 


We next discuss the existence of the adjoint map. 


Theorem 13.2 If V and W are K -vector spaces with dim(V) = dim(W) € Nand 
b is a non-degenerate bilinear form on V x W, then the following assertions hold: 


(1) For every f € L(V, V) there exists a uniquely determined g € L(W, W) with 
B(f(v), w) = B(v, g(w)) forallv € V and w E€ W. 


The map g is called the right adjoint of f with respect to 6. 
(2) For every h e LWW, W) there exists a uniquely determined k € L(V, V) with 


B(v, h(w)) = B(k(v), w) forall v € V and w € W. 


The map k is called the left adjoint of h with respect to p. 


Proof We only show (1); the proof of (2) is analogous. 

Let V* be the dual space of V, let f* € L(V*, V*)be the dual map of f, and 
let B® e L(W, V*)be as in (13.2). Since 8 is non-degenerate, 3 is bijective by 
Lemma 13.1. Define 


g := (B2)! o f*o B® e€ LW, W). 
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Then, for all v € V and w € W, 


Bv, g(w)) = Bv, (8B)! o f* o B™)(w)) 
= 8° (((B)7! 0 f* 0 B™)(w)) (v) 
=p (ey) F aaO 
(8° 6 e oF) 0) 
= bP (w)(f(v)) 
= B(f(v), w). 


(Recall that the dual map satisfies f*(G(w)) = 6” (w) o f.) 
It remains to show the uniqueness of g. Let g € L(W, W) with G(v, g(w)) = 
B(f (v), w) for all v € Vand w € W. Then (G(v, g(w)) = G(v, g(w)), and hence 


B(v, (g — g)(w)) = 0 forallv € Vand w € W. 


Since 8 is non-degenerate in the second variable, we have (g — g)(w) = Ofor all 
w € W, so that g = g. o 


Example 13.3 Let V = W = K™! and G(v,w) = wf Bv with a matrix B € 
GL,(K), so that @ is non-degenerate (cp. (1) in Example 11.10). We consider the 
linear map f : V > V, v |> Fv, with a matrix F € K™”, and the linear map 
h: W — W, w |= Hw, with a matrix H € K””. Then 


B,:W > K, w w! (Bv), 
BP: V >W", ve (Bo), 
BP : W > V*, wr wB, 


where we have identified the isomorphic vector spaces W* and K!”, respectively 
V* and K!”, with each other. If g € LOW, W) is the right adjoint of f with respect 
to 6, then 


Bf), w) = w" Bf (v) = w BFv = B(v, g(w)) = g(w)’ Bv 


for all v € Y and w e W. If we represent the linear map g via the multiplication 
with a matrix G € K™”, i.e., g(w) = Gw, then w? BFv = wG! Bv for all 
v,w € K™!. Hence BF = G" B. Since B is invertible, the unique right adjoint is 
given by G = (BFB!) = B-'F'B’. 

Analogously, for the left adjoint k € L(V, V) of h with respect to 8 we obtain the 
equation 


B(v, h(w)) = (h(w))’ Bv = w" A’ Bv = B(k(v), w) = w? Bk(v) 


for all v € V and w e W. With k(v) = Lv for a matrix L € K””, we obtain 
HTB = BL and hence L = B7! H" B. 
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If V is finite dimensional and 8 is a non-degenerate bilinear form on V, then by 
Theorem 13.2 every f € L(V, V) has a unique right adjoint g and a unique left 
adjoint k, such that 


PCF), w) = Gv, g(w)) and (v, f(w)) = Pko), w) (13.3) 


for all v, w € V. If 8 is symmetric, i.e., if G(v, w) = B(w, v) holds for all v, w € V, 
then (13.3) yields 


BC, g(w)) = BF (v), w) = p(w, f(v)) = Bw), v) = B(v, k(w)). 


Therefore, G(v, (e — k)(w)) = Ofor all v, w € V, and hence g = k, since ĝ is 
non-degenerate. Thus, we have proved the following result. 


Corollary 13.4 If 6 is a symmetric and non-degenerate bilinear form on a finite 
dimensional K -vector space V, then for every f € L(V, V) there exists a unique 


g € L(Y, V) with 


(f(r), w) = lv, g(w)) and fv, f(w)) = Plg (v), w) 
forallv,w E V. 


By definition, a scalar product on a Euclidean vector space is a symmetric and non- 
degenerate bilinear form (cp. Definition 12.1). This leads to the following corollary. 


Corollary 13.5 Jf V is a finite dimensional Euclidean vector space with the scalar 
product (-, -), then for every f € L(V, V) there exists a unique f*4 € L(V, V) with 


(fv), w) = (v, f% (w)) and (v, f(w)) = (fv), w) (13.4) 
forallv,w € V. The map f“ is called the adjoint of f (with respect to (-, -)). 


In order to determine whether a given map g € L(V, V) is the unique adjoint of 
f € L(V, V), only one of the two conditions in (13.4) have to be verified: If for 
f, 2 € L(V, VY) the equation 


(Jœ), w) = (v, g(w)) 
holds for all v, w € V, then also 
(v, f(w)) = (f w), v) = (w, g(v)) = (g(v), w) 
for all v, w € V, where we have used the symmetry of the scalar product. Similarly, 


if (v, f(w)) = (g(v), w) holds for all v, w € V, then also (f (v), w) = (v, g(w)) for 
alv, w € V. 
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Example 13.6 Consider the Euclidean vector space R*! with the scalar product 


100 
(v, w) = w! Dv, where D=|020], 
001 
and the linear map 
122 
f:R!'+sR!, ve Fv, where F=|101 
200 


For all v, w € R?! we then have 
(f(v),w) = w? DFv = w? DFD 'Dv = (DT FT DT w)" Dv = (v, f (w)), 


and thus 
122 


fd RII o RÈ, v= D'FTDv=|100) 1, 
220 


where we have used that D is symmetric. 


We now show that uniquely determined adjoint maps also exist in the unitary case. 
However, we cannot conclude this directly from Corollary 13.4, since a scalar product 
on a C-vector space is not a symmetric bilinear form, but a Hermitian sesquilinear 
form. In order to show the existence of the adjoint map in the unitary case we construct 
it explicitly. This construction works also in the Euclidean case. 

Let Y be a unitary vector space with the scalar product (-, -) and let {u,,..., Un} 
be an orthonormal basis of V. For a given f € L(V, V) we define the map 


gi: VV, v X v, fuu. 


=i 
Ifv, w € Vand A, u € C, then 


n 


gv + pw) = X (Av + uw, f(ui))u -ZA v, f (uj))ui + ulv, f (ui))u;) 


i=1 i=1 
= Ag(v) + ug (w), 
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and hence g € L(V, V). Let now v = X`; Aju; € V and w € VY, then 


(v, g(w)) = (Zu Zi f(uj))uj) = CD) (w, Fu) = Da (f (ui), w) 
j=l 
= (f (v), w). 


Furthermore, 


(w, f(w)) = (f (w), v) = (w, g(v)) = (g(v), w) 


for all v, w € V. If g € L(V, V)satisfies (f(v), w) = (v, g(w)) for all v, w € V, 
then g = g, since the scalar product is positive definite. We can therefore formulate 
the following result analogously to Corollary 13.5. 


Corollary 13.7 If V is a finite dimensional unitary vector space with the scalar 
product (-, -}, then for every f € L(V, V) there exists a unique f°! € L(V, V) with 


(f(v), w) = w] w and (v, f(w)) = (f*(v), w) (13.5) 


forallv,w € V. The map f“ is called the adjoint of f (with respect to (-, -)). 


As in the Euclidean case, again the validity of one of the two equations in (13.5) 
for all v, w € Y implies the validity of the other for all v, w € V. 


Example 13.8 Consider the unitary vector space C>! with the scalar product 


100 
(v, w) = w” Dv, where D=|020], 
001 
and the linear map 
12i 2 
f:C!+C!, v Fv, where F= | i 0 —i 
20 3i 


For all v, w € C?! we then have 
(f(v), w) = w" DFv = w” DFD™' Dv = (D ”" F” D” w)” Dv 
= (v, f“ (w)), 
and thus 
1-—2i 2 


J” C31 _, CH. veo D'FDv=\-i 0 0 U, 
2 2i—3i 
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where we have used that D is real and symmetric. 
We next investigate the properties of the adjoint map. 


Lemma 13.9 Let V be a finite dimensional Euclidean or unitary vector space. 


(1) If fi, fo € L(V, Vand A1, Ao € K (where K = R in the Euclidean and K = C 
in the unitary case), then 


Oasis eb) sA Pot. 


In the Euclidean case the map f +» f“ is therefore linear, and in the unity case 
semilinear. 


(2) We have (Idy)“4 = Idy. 

(3) For every f € L(V, V)we have (f%4)4 = f. 

(4) If fis fo € LOY, V), then (fro fi)’ = fE o fe" 
Proof 


(1) Ifv, w € V and Xj, Ao € K, then 
(Ar fi + A2f2)(v), w) = A (fi), w) + A2(fo(v), w) 
= Aj (v, r4(w)) + Ao (v, J4(w)) 
= (v, Xi fi") + Aa ff (w) 
= (v, (A Fe +2 fit) w), 


and thus (A; fi + Ao f2) = M fE + Ao fe. 

(2) For all v,w e V we have (Idy(v), w) = (v,w) = (v,Idy(w)), and thus 
(Idy)*? = Idy. 

(3) For all v, w € V we have (f°! (v), w) = (v, f(w)), and thus (f8 = f. 

(4) For all v, w € V we have 


(fro fi)(v), w) = (fA(fi(v)), w) = (fir), fw) = (v, fe (fiw) 
= (v, (ff! o FES) W), 
and thus (DoJ = 77" o o 


The following result shows relations between the image and kernel of an endo- 
morphism and of its adjoint. 


Theorem 13.10 Jf V is a finite dimensional Euclidean or unitary vector space and 
f € L(V, V), then the following assertions hold: 
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(1) ker(f**) = im(f)~. 
(2) ker(f) = im(f™)~. 


Proof 
(1) If w € ker( f%“), then f" (w) = 0 and 
0 = (v, f“"(w)) = (f(v), w) 


for all v € VY, hence w € im(f)*. If, on the other hand, w € im(f)+, then 


o= (f(v), w) = (v, w 


for all v € V. Since (-, -) is non-degenerate, we have f ad (w) = 0 and, hence, 
w € ker(f%). 
(2) Using (f%)*4 = f and (1) we get ker(f) = ker(( f)*“) = im( f%)+. o 


Example 13.11 Consider the unitary vector space C*:! with the standard scalar prod- 
uct and the linear map 


lii 
f:C!+C!, ve Fv, with F=|i00 
100 
Then 
iit 
f: ol a el ve F#v, with FX =| -i 00 
—i 00 


The matrices F and F” have rank 2. Therefore, dim(ker( f)) = dim(ker( f““)) = 1. 
A simple calculation shows that 


0 0 
ker( f) = span 1 and ker(f““) = span ł | 1 
—1 i 


The dimension formula for linear maps implies that dim (im ( f )) = dim (im (f ad)) = 2, 
From the matrices F and F” we can see that 


1 1 1 1 
im(f) = span i|, 10 and im(f““) = span —i |, |0 
1 0 —i 0 


The equations ker( f““) = im(f)+ andker( f) = im( f“)+ can be verified by direct 
computation. 
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13.2 Adjoint Endomorphisms and Matrices 


We now study the relation between the matrix representations of an endomorphism 
and its adjoint. Let V be a finite dimensional unitary vector space with the scalar 
product (-, -) and let f € L(V, V). For an orthonormal basis B = {u,,..., Uun} of V 
let [flee = [a;;] eC i.e., 


n 
fGp =) ae JH ee 
k=1 


and hence 


Ei” la = [bij] € C, ie., 
fei) dite eet 
k=1 


then 
bi; = (f*“ (uj), ui) = (uj, f(ud)) = (fu), uj) = ji. 


Thus, [f ad B.B = (Lf lez. g)”. The same holds for a finite dimensional Euclidean 
vector space, but then we can omit the complex conjugation. Therefore, we have 
shown the following result. 


Theorem 13.12 Jf V is a finite dimensional Euclidean or unitary vector space with 
the orthonormal basis B and f € L(V, V), then 


7" lz = la 
(In the Euclidean case (Lfle.p)7 = (Lflp.p)'.) 
An important special class are the selfadjoint endomorphisms. 


Definition 13.13 Let V be a finite dimensional Euclidean or unitary vector space. 
An endomorphism f € L(V, V) is called selfadjoint when f = f%. 


Trivial examples of selfadjoint endomorphism in L(V, V) are f = 0 and Idy. 


Corollary 13.14 


(1) If V is a finite dimensional Euclidean vector space, f € L(V, VY) is selfadjoint 
and B is an orthonormal basis of V, then | f |g, g is a symmetric matrix. 
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(2) If Vis a finite dimensional unitary vector space, f € L(V, V) is selfadjoint and 
B is an orthonormal basis of V, then | f |g g is an Hermitian matrix. 


The selfadjoint endomorphisms again form a vector space. However, one has to 
be careful to use the appropriate field over which this vector space is defined. In 
particular, the set of selfadjoint endomorphisms on a unitary vector space V does not 
form a C-vector space. If f = f! € L(V, V) \ {0}, then (if) = if"! = -if + 
if (cp. (1) in Lemma 13.9). Similarly, the Hermitian matrices in C”” do not form 
a C-vector space. If A = A” e C™” \ {0} is Hermitian, then (iA)” = —iA” = 
—iA FiA. 

Lemma 13.15 


(1) If V is an n-dimensional Euclidean vector space, then the set of selfadjoint 
endomorphisms {f € L(V, V) | f = f"! forms an R-vector space of dimension 
n(n + 1)/2. 

(2) If V is ann-dimensional unitary vector space, then the set of selfadjoint endo- 
morphisms {f € L(V,V)| f = f%4}forms an R-vector space of dimension 
n?. 


Proof Exercise. m 


A matrix A € C™” with A = A’ is called complex symmetric. Unlike the Her- 
mitian matrices, the complex symmetric matrices form a C-vector space. 


Lemma 13.16 The set of complex symmetric matrices in C™” forms a C-vector 
space of dimension n(n + 1)/2. 


Proof Exercise. o 


Lemmas 13.15 and 13.16 will be used in Chap. 15 in our proof of the Fundamental 
Theorem of Algebra. 


Exercises 


13.1. Let B(v, w) = wf Bv with B = diag(1, —1)be defined for v, w € R*!. 
Consider the linear maps f : R>! > R?!, v > Fv, and h : Rè! > R?!, 
wt> Hw, where 


12 22 _ {10 2.2 
F=|9;{eR H=| {eR . 


Determine 6,, 3 and B® as in (13.1)-(13.2) as well as the right adjoint of 
f and the left adjoint of h with respect to 2. 

13.2. Let (V, (-,-)y) and (W, (-, -)w) be two finite dimensional Euclidean vec- 
tor spaces and let f € L(V, W). Show that there exists a unique g € 
LW, V) with (f(v), w)w = w, g(w))y for all v € Y and w € W. 

13.3. Let (v, w) = wf Bv for all v, w € R?! with 


i21 22 
B=|ii/eR 
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13.4. 


19D: 


13.6. 


13.7. 


13.8. 


13.9. 


13.10. 


131i: 


13-12. 


13,13: 
13.14. 


(a) Show that (v, w) = w? Bv is a scalar product on R*!. 

(b) Using this scalar product, determine the adjoint map f“ of f : R>! > 
R*! vp Fv, with F e R??. 

(c) Investigate which properties F needs to satisfy so that f is selfadjoint. 


Letn > 2 and 


. pnl i T T 
f: R — R”, [x1,..., Xn]  [0,%1,..., Xna]. 


Determine the adjoint f! of f with respect to the standard scalar product of 
R®!, 

Let V be a finite dimensional Euclidean or unitary vector space and let f € 
L(V, V). Show that ker( ff o f) = ker(f) andim( f! o f) = im(f%). 
Let V be a finite dimensional Euclidean or unitary vector space, let U C V be 
a subspace and let f € L(V, V) with f (U) C U. Show that then f4 (U+) C 
Ut. 

Let V be a finite dimensional Euclidean or unitary vector space, let f € 
L(V, V)andv € V. Show that v € im(f)if and only if v € ker( f%)+. 
“Matrix version”: For A € C”” and b € C™! the linear system of equations 
Ax = b has a solution if and only if b € Y(A”, 0)+. 

Let V be a finite dimensional Euclidean or unitary vector space and let f, g € 
L(V, V) be selfadjoint. Show that f o g is selfadjoint if and only if f and g 
commute, 1.e., fog=go0f. 

Let V be a finite dimensional unitary vector space and let f € L(V, V). Show 
that f is selfadjoint if and only if (f (v), v) € R holds for all v € V. 

Let V be a finite dimensional Euclidean or unitary vector space and let f € 
L(V, V) be a projection, i.e., f satisfies f? = f. Show that f is selfadjoint 
if and only if ker( f) L im( f), 1.e., (v, w) = O holds for all v € ker(f)and 
w €im(f). 

Let V be a finite dimensional Euclidean or unitary vector space and let f, g € 
L(V, V). Show that if g@4 o f = 0 € L(Y, V), then (v, w) = 0 holds for all 
v € im(f)andw € im(g). 

For two polynomials p,q € R[t]<n let 


1 
(p,q) = p(t)q(t) dt. 
-1 


(a) Show that this defines a scalar product on R[t]<n. 
(b) Consider the map 


f:Riflen > Riflen, p= > at e > iat, 


i=0 t=) 


and determine f@“, ker( f°”), im(f), ker( f@@)+ and im(f)*. 


Prove Lemma 13.15. 
Prove Lemma 13.16. 


Chapter 14 
Eigenvalues of Endomorphisms 


In previous chapters we have already studied eigenvalues and eigenvectors of matri- 
ces. In this chapter we generalize these concepts to endomorphisms, and we inves- 
tigate when endomorphisms on finite dimensional vector spaces can be represented 
by diagonal matrices or (upper) triangular matrices. From such representations we 
easily can read off important information about the endomorphism, in particular its 
eigenvalues. 


14.1 Basic Definitions and Properties 
We first consider an arbitrary vector space and then concentrate on the finite dimen- 
sional case. 


Definition 14.1 Let V be a K-vector space and f € L(V, VY). If A € K andv € 
V \ {0} satisfy 
fv) = rv, 


then A is called an eigenvalue of f, and v is called an eigenvector of f corresponding 
to A. 


By definition, v = O cannot be an eigenvector, but an eigenvalue A = 0 may occur 
(cp. the example following Definition 8.7). 
The equation f (v) = Av can be written as 


0 = Av — fv) = Aldy — f)(). 


Hence, A € K is an eigenvalue of f if and only if 


ker(AIdy — f) Æ {0}. 
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We already know that the kernel of an endomorphism on V forms a subspace of V 
(cp. Lemma 10.7). This holds, in particular, for ker(AIdy — f). 


Definition 14.2 If Visa K -vector space and A € K is an eigenvalue of f € L(V, V), 
then the subspace 
ViA = ker(Aldy — f) 


is called the eigenspace of f corresponding to À and 


g(A, f) = dim(V (A) 
is called the geometric multiplicity of the eigenvalue A. 


By definition, the eigenspace Vp (A) is spanned by all eigenvectors of f cor- 
responding to the eigenvalue A. If V;(A) is finite dimensional, then g(\, f) = 
dim(V ¢(A)) is equal to the maximal number of linearly independent eigenvectors 
of f corresponding to A. 


Definition 14.3 Let V be a K-vector space, let U C V be a subspace, and let 
f e L(Y, V). Tf fU) CU, ie., if f(u) € U holds for all u € U, then U is called an 
f-invariant subspace of V. 


An important example of f-invariant subspaces are the eigenspaces of f. 


Lemma 14.4 IfV is a K-vector space and A € K is an eigenvalue of f € L(V, V), 
then V ¢(A) is an f-invariant subspace of V. 


Proof For every v € V¢(A) we have f(v) = Av € ViA). o 


We now consider finite dimensional vector spaces and discuss the relationship 
between the eigenvalues of f and the eigenvalues of a matrix representation of f 
with respect to a given basis. 


Lemma 14.5 If V is a finite dimensional K -vector space and f € L(V, V), then 
the following statements are equivalent: 


(1) A€ K is an eigenvalue of f. 
(2) A € K is an eigenvalue of the matrix | f |g, g for every basis B of V. 


Proof Let A € K be an eigenvalue of f and let B = {v;,..., v,} be an arbitrary 
basis of V. If v € V is an eigenvector of f corresponding to the eigenvalue A, then 
f(v) = Av and there exist (unique) coordinates u1,..., Un € K, not all equal to 


zero, with v = >; jvj. Using (10.4) we obtain 


Hı fy 
[flee | © | = @®a(f(v)) = Prv) = AgM) =A]: |, 
Hn Hn 
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and thus A is an eigenvalue of [ f ]g.B. 

If, on the other hand, [f]s s[i, ..., nl’ = Alu, ..-, Hn]! with [w1,..., 
Un]! Æ O for a given (arbitrary) basis B = {v1,..., Vn} of V, then we set 
v := Do, Hj vj. Then v A 0 and 


F 


n Hı Hı 
To= a OE ees FO) | 2 E ea la) 
i=? Ln Ln 
Hı 
=(U1,...,Un) A : = AV 
Hn 
i.e., À is an eigenvalue of f. o 


Lemma 14.5 implies that the eigenvalues of f are the roots of the characteristic 
polynomial of the matrix [f ]s,g (cp. Theorem 8.8). This, however, does not hold 
in general for a matrix representation of the form [f]g 3, where B and B are two 
different bases of V. In general, the two matrices 


[flog =Udvle g lfle.e and [f]s,B 
do not have the same eigenvalues. 


Example 14.6 Consider the vector space R! with the bases 


e- -EEN 


Then the endomorphism 


f: R! +R", v Fv, where r= |o] 


has the matrix representations 


01 1| —11 
flee =| 9) faa = 5 | af 


We have det(t h — [f ]s, B) = t° — 1, and thus f has the eigenvalues —1 and 1. On 
the other hand, the characteristic polynomial of [f] g 3 is = L, so that this matrix 


has the eigenvalues —1/./2 and 1/,/2. 


For two different bases B and B of V the matrices [f]s,s and [f ]g g are similar 
(cp. the discussion following Corollary 10.20). In Theorem 8.12 we have shown that 
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similar matrices have the same characteristic polynomial. This justifies the following 
definition. 


Definition 14.7 If n € N, V is an n-dimensional K -vector space with the basis B, 
and f € L(Y, V), then 


Py := det(th —I[flse.e) € K[t] 


is called the characteristic polynomial of f. 


The characteristic polynomial Pp is always a monic polynomial with 
deg(Pr) =n = dim(V). 


As we have discussed before, Py is independent of the choice of the basis of V. A 
scalar A € K is an eigenvalue of f if and only if A is a root of Py, i.e., Pr(A) = 0. 
As shown in Example 8.9, in real vector spaces with dimensions at least two, there 
exist endomorphisms that do not have eigenvalues. 

If A is a root of Py, then Py = (t — A) - q for a monic polynomial q € K[t], 
i.e., the linear factor t — A divides the polynomial P+; we will show this formally in 
Corollary 15.5 below. If also q (A) = 0, then q = (t — A) -g for a monic polynomial 
q € K[t], and thus Py = (t — \)* - q. We can continue until Py = (t — A)“ - g fora 
g € K[t] with (A) Æ 0. This leads to the following definition. 


Definition 14.8 Let V be a finite dimensional K -vector space, and let f € L(V, V) 
have the eigenvalue A € K. If the characteristic polynomial of f has the form 
Py =(t—A)*-8 

for some g € K[t] with g(A) Æ 0, then d is called the algebraic multiplicity of the 
eigenvalue A of f. It is denoted by a(\, f). 

If Ay, ..., Ax are the pairwise distinct eigenvalues of f with corresponding alge- 
braic multiplicities a(\;, f),...,a(Ax, f), and if dim(V) = n, then 

GN J) sc POOR, J) aN, 


since deg(Py) = dim(V) = n. 
Example 14.9 The endomorphism f : R*! > R*!, ut Fv with 


12 34 

-JOL 23 aa 

F=/o0 01| & ® > 
00-10 


has the characteristic polynomial Pr = (t — 1)?(t?+ 1). The only real root of Py is 1, 
mdan f) =2 <4 = dim(R*!), 
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Lemma 14.10 Jf VY is a finite dimensional K -vector space and f € L(V, V), then 


gA, f) < al, f) 


for every eigenvalue A of f. 


Proof Let A € K be an eigenvalue of f with geometric multiplicity m = g (å, f). 
Then there exist m linear independent eigenvectors v1,..., Um E V of f corre- 
sponding to the eigenvalue A. If m = dim(V), then these m eigenvectors form a 
basis B of V. If m < dim(V) = n, then we can extend the m eigenvectors to a basis 
B= {V ns ms a OL NV: 

We have f(v;) = Av; for j = 1,...,m and, therefore, 


for two matrices Z; € K™”-™ and Z2 € K” ™”-™., Using (1) in Lemma 7.10 we 
obtain 
Pr = det(tl, — [fle B) = (t — A)” -det(tl,m = Z2), 


which implies a(A, f) > m = g(à, f). o 


In the following we will try to find a basis of V, so that the eigenvalues of a 
given endomorphism f can be read off easily from its matrix representation. The 
easiest forms of matrices in this sense are diagonal and triangular matrices, since 
their eigenvalues are just their diagonal entries. 


14.2 Diagonalizability 


In this section we will analyze when for a given endomorphism has a diagonal matrix 
representation. We formally define this property as follows. 


Definition 14.11 Let V be a finite dimensional K -vector space. An endomorphism 
f € L(V, V) is called diagonalizable, if there exists a basis B of V, such that [ f ] 8,8 
is a diagonal matrix. 


Accordingly, a matrix A € K™” is diagonalizable when there exists a matrix 
S € GL,(K) with A = SDS™! for a diagonal matrix D € K””. 

In order to analyze the diagonalizablility, we begin with a sufficient condition for 
the linear independence of eigenvectors. This condition also holds when VY is infinite 
dimensional. 


Lemma 14.12 Let V be a K-vector space and f € L(V, V). If 1, ..., Ax E€ K, 
k > 2, are pairwise distinct eigenvalues of f with corresponding eigenvectors 
Vj,..., Ue E V, then vı, ..., vg are linearly independent. 
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Proof We prove the assertion by induction on k. Let k = 2 and let v1, v2 be eigen- 
vectors of f corresponding to the eigenvalues Aj # A2. Let ui, u2 E€ K with 
uiv + u2v2 = 0. Applying f on both sides of this equation as well as multiplying 
the equation with A» yields the two equations 


LAV, + 2A2v2 = Q, 
Ly A2V1 + p2A2v2 = Q. 
Subtracting the second equation from the first, we get uı (ài — Az2)v; = 0. Since 
A, Æ Az and vı Æ 0, we have u, = 0. Then from uivi + f2v2 = 0 we also obtain 
u2 = 0, since v Æ 0. Thus, vı and v2 are linearly independent. 
The proof of the inductive step is analogous. We assume that the assertion holds 


for some k > 2. Let Aj,..., Ax 1 be pairwise distinct eigenvalues of f with corre- 
sponding eigenvectors vj,..., Ug+1, and let u1, ..., 4x41 E€ K Satisfy 


Hivi +... + Ve + Hk+1Yk+1 = O. 
Applying f to this equation yields 
pr Avy +... b HkÀkVk + HktlÀk+1Vk+1 = 9, 
while a multiplication with A, gives 
MAU Fere  MeAR LUE Furrier = 0. 
Subtracting this equation from the previous one we get 


ui (ài — Agiy)Up +... + Uk (Ax — Apis), = Q. 


Since A;,..., Ag+1 are pairwise distinct and vj, ..., vg are linearly independent by 
the induction hypothesis, we obtain uw; = --- = ug = 0. But then uk+1Vk+1 = O 
implies that also ug+1 = 0, so that v1, ..., Vg+1 are linearly independent. Oo 


Using this result we next show that the sum of eigenspaces corresponding to 
pairwise distinct eigenvalues is direct (cp. Theorem 9.31). 


Lemma 14.13 Let V be a K-vector space and f € L(V, V). If \1,...,rA E K, 
k > 2, are pairwise distinct eigenvalues of f, then the corresponding eigenspaces 
satisfy 


k 
Vr) AD VA) = {0} 
m 


jor allt = 1, sK: 
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Proof Let i be fixed and let 


k 
v E Ve) AD VA;). 
a 
In particular, we have v = ae v; for some v; € Ve(A;), j A i. Then —v + 
> jæi Vj = 9, and the linear independence of eigenvectors corresponding to pairwise 
distinct eigenvalues (cp. Lemma 14.12) implies v = 0. o 


The following theorem gives necessary and sufficient conditions for the diago- 
nalizability of an endomorphism on a finite dimensional vector space. 


Theorem 14.14 IfV is a finite dimensional K -vector space and f € L(V, V), then 
the following statements are equivalent: 


(1) f is diagonalizable. 

(2) There exists a basis of V consisting of eigenvectors of f. 

(3) The characteristic polynomial Ps decomposes into n = dim(V) linear factors 
over K, i.e., 


Pea) a a) 


with the eigenvalues \\,..., An € K of f, and for every eigenvalue A; we have 


SAG, J) SOG .J): 
Proof 


(1) & (2): If f e L(V, V) is diagonalizable, then there exists a basis B = 
{v;,..., Vn} of V and scalars A1, ..., A, € K with 


rj 


[f]s,8 = o i (14.1) 
An 


and hence f(v;) = Ajuj, j = 1,..., n. The scalars A1,..., A, are thus eigen- 
values of f, and the corresponding eigenvectors are v1, ..., Un. 
If, on the other hand, there exists a basis B = {vj,..., Va} of V consisting of 
eigenvectors of f, then f(v;) = Ajuj;, j = 1, ..., n, for scalars \1,...,An E K 
(the corresponding eigenvalues), and hence [f ]g,g has the form (14.1). 

(2) => (3): Let B = {v1,..., v,} be a basis of V consisting of eigenvectors of f, 
and let A;,..., An E€ K be the corresponding eigenvalues. Then [f]g,g has the 
form (14.1) and hence 


Pe OHA) Fi) 


so that Py decomposes into linear factors over K. 


206 


(3 


Ne 


14 Eigenvalues of Endomorphisms 


We still have to show that g(\;, f) = a(A;, f) for every eigenvalue A;. The 
eigenvalue A; has the algebraic multiplicity m; := a(Aj;, f) if and only if A; 
occurs m ; times on the diagonal of the (diagonal) matrix [f ]s,s. This holds if 
and only if exactly m ; vectors of the basis B are eigenvectors of f corresponding 
to the eigenvalue Aj. Each of these m ; linearly independent vectors is a element 
of the eigenspace V-(A;) and, hence, 


dim(V (Aj) = gj, f) 2 mj = aj, f). 
From Lemma 14.10 we know that g(A;, f) < a(Aj;, f), and thus g(\;, f) = 
a(A;, J): _ _ 
= (2): Let 1, . . . , Ax be the pairwise distinct eigenvalues of f with correspond- 


ing geometric and algebraic multiplicities g(\;, f) and a(r Me = lata 
respectively. Since P decomposes into linear factors, we have 


k 
X aQj, f) =n = dim). 
j= 
Now g(A;, f) =a(\;, f), j = 1,...,k, implies that 
k ~ 
> 8Qj, f) =n = dim(V). 
j=! 
By Lemma 14.13 we obtain (cp. also Theorem 9.31) 


VA) Dece VAr) sy. 


If we select bases of the respective eigenspaces V ¢ A Ds J E ky Men we 
get a basis of Y that consists of eigenvectors of f. 


O 


Theorem 14.14 and Lemma 14.12 imply an important sufficient condition for 


diagonalizability. 


Corollary 14.15 Jf V is an n-dimensional K -vector space and f € L(V, V) has n 


pairwise distinct eigenvalues, then f is diagonalizable. 


The condition of having n = dim(V) pairwise distinct eigenvalues is, however, not 


necessary for the diagonalizability of an endomorphism. A simple counterexample 
is the identity Idy, which has the n-fold eigenvalue 1, while [Idy]z.2 = J, holds 
for every basis B of V. On the other hand, there exist endomorphisms with multiple 
eigenvalues that are not diagonalizable. 
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Example 14.16 The endomorphism 


f:R!'>R*!, ve Fv with F = o if 
has the characteristic polynomial (t — 1)? and thus only has the eigenvalue 1. We 
have ker(V(1)) = span{[1, 0] } and thus g(1, f) = 1 < a(1, f) = 2. By Theo- 
rem 14.14, f is not diagonalizable. 


14.3 Triangulation and Schur’s Theorem 


If the property g(\;, f) = a(àj, f) does not hold for every eigenvalue A; of f, 
then f is not diagonalizable. However, as long as the characteristic polynomial Pr 
decomposes into linear factors, we can find a special basis B such that [f ]g,g is a 
triangular matrix. 


Theorem 14.17 IfV is a finite dimensional K -vector space and f € L(V, V), then 
the following statements are equivalent: 


(1) The characteristic polynomial P; decomposes into linear factors over K. 
(2) There exists a basis B of V such that | f |g, g is upper triangular, i.e., f can be 
triangulated. 


Proof 


(2) > (1): Ifn = dim(V) and [f], = [rij] € K™” is upper triangular, then 
Pr = (t —7r1)-...+(t —Tyn). 
(1) = (2): We show the assertion by induction on n = dim(V). The case n = 1 is 
trivial, since then [f]g, g € Kb, 
Suppose that the assertion holds for an n > 1, and let dim(V) = n + 1. By 
assumption, 
P= (fA) a I A) 


where A,,...,An41 E€ K are the eigenvalues of f. Let vi € V be an eigen- 
vector corresponding to the eigenvalue ,;. We extend this vector to a basis 
B = {v,, w2, ..., Wn41}Of V. With Bw := {w2,..., Wn+1} and W := span Bw 
we have V = span{v,;} ® W and 


Ail an > Al n+l 





a22 ... M2 n+l 


[flee = 


Oe Farm ge ee EET 
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We define h € L(W, span{v;}) and g € LW, W) by 


n+l 


h(w;) :=a,j;v,; and g(w;) = D> agjwr, J=2,...,n 41. 
k=2 


Then f(w) = h(w) + g(w) for all w € W, and 


| Ay Alay) | 
[fls,8 = | ida 
Consequently, 
(¢— A) Py = Pp =(t—A1)-... + E — Andi), 

and hence P, = (t — Az) -... + (t — Angi). Now dim(W) = n and the char- 
acteristic polynomial of g € L(W, W) decomposes into linear factors. By the 
induction hypothesis there exists a basis By = {W2,..., Wn+1} of W such that 
[g]z,,.@y Upper triangular. Thus, for the basis B4 := {v1, W2,..., Wn+1} the 
matrix [f]z,,2, 1S upper triangular. o 


A “matrix version” of this theorem reads as follows: The characteristic polynomial 
Pa of A e K”” decomposes into linear factors over K if and only if A can be 
triangulated, i.e., there exists a matrix S € GL,(K) with A = SRS~! for an upper 
triangular matrix R € K™”. 


Corollary 14.18 Let V be a finite dimensional Euclidian or unitary vector space 
and f € L(V, V). If Ps decomposes over R (in the Euclidian case case) or C (in 
the unitary case) into linear factors, then there exists an orthonormal basis B of V, 
such that | f |g g is upper triangular. 


Proof If P; decomposes into linear factors, then by Theorem 14.17 there exists a 
basis Bı of V, such that [f ]s,,g, 1s upper triangular. Applying the Gram-Schmidt 
method to the basis Bı, we obtain an orthonormal basis Bz of V, such that [Idy]z, B, 
is upper triangular (cp. Theorem 12.11). Then 


[fle.e = ldy]e,, B [f]e,, 8 Udv], g, = ldv]z g [f]e, g Udv]e,B,- 


The invertible upper triangular matrices form a group with respect to the matrix 
multiplication (cp. Theorem 4.13). Thus, all matrices in the product on the right 
hand side are upper triangular, and hence [f ]s,, g, 1s upper triangular. Oo 


Example 14.19 Consider the Euclidian vector space IR[t]<; with the scalar product 
(p,q) = fy p(t)q(2) dt, and the endomorphism 
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f : Rit]< ~ Rit]e1, @ıt + ao th 2a;t + Qo. 


We have f (1) = 1 and f(t) = 2t, i.e., the polynomials | and ¢ are eigenvectors of 
f corresponding to the (distinct) eigenvalues 1 and 2. Thus, B = {1, t} is a basis 
of IR[t]<1, and [ f]g g is a diagonal matrix. Note that B is not an orthonormal basis, 
since in particular (1, t) Æ 0. 

Since P; decomposes into linear factors, Corollary 14.18 guarantees the existence 
of an orthonormal basis B for which [f]g 2 is upper triangular. In the proof of the 
implication (1) = (2) of Theorem 14.17 one chooses any eigenvector of f, and 
then proceeds inductively in order to obtain the triangulation of f. In this example, 
let us use qı = 1 as the first vector. This vector is an eigenvector of f with norm 
1 corresponding to the eigenvalue 1. If q2 € R[t]<; is a vector with norm 1 and 
(q1, 92) = 0, then B = {q1, q2} is an orthonormal basis for which [ f]g, 2 is an upper 
triangular matrix. We construct the vector q2 by orthogonalizing t against qı using 
the Gram-Schmidt method: 


Pa 1 
q2 =t —(t,qi)q i= 
ral 1 1\1/2 1 
q = = 5 — >) = —, 
: 2 2 JI 


qo = IRI G = V12t — V3. 


This leads to the triangulation 


We could also choose qı = /3t, which is an eigenvector of f with norm 1 
corresponding to the eigenvalue 2. Orthogonalizing the vector | against qı leads to 
the second basis vector q2 = —3t + 2. With the corresponding basis Bı we obtain 


the triangulation 


B EENG] 2,2 
[finn = o i Jer 


This example shows that in the triangulation of f the elements above the diagonal can 
be different for different orthonormal bases. Only the diagonal elements are (except 
for their order) uniquely determined, since they are the eigenvalues of f. A more 
detailed statement about the uniqueness is given in Lemma 14.22. 


In the next chapter we will prove the Fundamental Theorem of Algebra, which 
states that every non-constant polynomial over C decomposes into linear factors. 
This result has the following corollary, which is known as Schur’s theorem.' 


'Tssai Schur (1875-1941). 
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Corollary 14.20 Jf V is a finite dimensional unitary vector space, then every endo- 
morphism on V can be unitarily triangulated, i.e., for each f € L(V, V) there exists 
an orthonormal basis B of V, such that | f |g g is upper triangular. The matrix | f |g. B 
is called a Schur form of f. 


If V is the unitary vector space C”! with the standard scalar product, then we 
obtain the following “matrix version” of Corollary 14.20. 


Corollary 14.21 Jf A € C™”, then there exists a unitary matrix Q € C™” with 
A = QRQ” for an upper triangular matrix R € C"". The matrix R is called a 
Schur form of A. 


The following result shows that a Schur form of a matrix A € C”” with n pairwise 
distinct eigenvalues is “almost unique”. 


Lemma 14.22 Let A € C”” have n pairwise distinct eigenvalues, and let R,, Ro € 
C”” be two Schur forms of A. If the diagonals of R, and Rz are equal then R, = 
URU” for a unitary diagonal matrix U. 


Proof Exercise. o 


A survey of the results on unitary similarity of matrices can be found in the 
article [Sha91]. 


MATLAB-Minute. 
Consider for n > 2 the matrix 


D 3 > n 
3 4 --- n+] 
A=|!l 4 Sees Wap 2 | 2 eine 


ln+1n+2...2n—-1 


Compute a Schur form of A using the command [U,R] = schur(A) forn = 
2,3,4,...10. What are the eigenvalues of A? Formulate a conjecture about the rank 
of A for general n. Can you prove your conjecture? 


Exercises 


(In the following exercises K is an arbitrary field.) 


14.1. Let V be a vector space and let f € L(V, V) have the eigenvalue A. Show 
that im(AIdy — f) is an f-invariant subspace. 

14.2. Let V be a finite dimensional vector space and let f € L(V, V) be bijective. 
Show that f and f~! have the same invariant subspaces. 
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14.3. Let V be an n-dimensional K -vector space, let f € L(V, V), and let U be an 
m-dimensional f-invariant subspace of V. Show that a basis B of V exists 
such that 


mer 
[flee = ie “a 


for some matrices A; € K™”, A) e K™”"”™ and A3 E€ K ™"—™, 
14.4. Let K € {R, C} and f : K*! = K*!,uwe Fv with 


12 34 
01 23 
=o 11 
00-10 


Compute P; and determine for K = R and K = C the eigenvalues of f 
with their algebraic and geometric multiplicities, as well as the associated 
eigenspaces. 

14.5. Consider the vector space R[t]<, with the standard basis {1, t, ...,t”} and 
the endomorphism 


n . n . g 
f: Ritl]en > Riflen, > ait > X il- at? = m 
i=0 i=2 


Compute Py, the eigenvalues of f with their algebraic and geometric mul- 
tiplicities, and examine whether f is diagonalizable or not. What changes if 
one considers as map the kth derivative (for k = 3,4,..., n)? 

14.6. Examine whether the following matrices 


01 ae bee a 
— 22 _ 3,3 = 4,4 
a=] 0| £2 , B= Ta EQ, C= 2224 EQ 
000 2 
are diagonalizable. 


14.7. Is the set of all diagonalizable and invertible matrices a subgroup of GL, (K )? 
14.8. Letn € No. Consider the IR-vector space IR[t]<, and the map 


f :Ritlen > Ritlen, p(t) p(tt+1)— p@). 


Show that f is linear. For which n is f diagonalizable? 
14.9. Let V be an IR-vector space with the basis {v1, ..., Un}. Examine whether the 
following endomorphisms are diagonalizable or not: 


(a) FO) Ht tas J Hl tt = 1 a7) = 0a; 
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14.10. 


14.11. 


14.12. 


14.13. 


14.14. 


14.15. 


14.16. 


14.17. 
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(FONS Et = yeg i = ian a 


Let V be a finite dimensional Euclidian vector space and let f € L(Y, V) 
with f + f1 = 0 € L(V, V). Show that f 4 0 if and only if f is not 
diagonalizable. 

Let V be a C-vector space and let f € L(V, V) with f? = —Idy. Determine 
all possible eigenvalues of f. 

Let V be a finite dimensional vector space and f € L(V, V). Show that 
Let V be a finite dimensional K -vector space, let f € L(V, V) and 


p= (t= 1) 251° — HUn) € Kila 


Show that p(f) is bijective if and only if u1, ..., [4m are not eigenvalues of 


f. 


Determine conditions for the entries of the matrices 
Q 
A= | p | eR”, 
y 0 


such that A is diagonalizable or can be triangulated. 

Determine an endomorphism on R[t]<; that is not diagonalizable and that 
cannot be triangulated. 

Let V be a vector space with dim(V) = n. Show that f € L(V, V) can be 
triangulated if and only if there exist subspaces Vo, V1, ..., Vn of V with 


(a) V CV aq fot p= 0, hem, 
(b) dim(V;) = j for j = 0, 1,...,n, and 
(c) V; is f-invariant for j = 0,1,..., 7. 


Prove Lemma 14.22. 


Chapter 15 
Polynomials and the Fundamental Theorem 
of Algebra 


In this chapter we discuss polynomials in more detail. We consider the division 
of polynomials and derive classical results from polynomial algebra, including the 
factorization into irreducible factors. We also prove the Fundamental Theorem of 
Algebra, which states that every non-constant polynomial over the complex num- 
bers has a least one complex root. This implies that every complex matrix and every 
endomorphism on a (finite dimensional) complex vector space has at least one eigen- 
value. 


15.1 Polynomials 


Let us recall some of the most important terms in the context of polynomials. If K 
is a field, then 


p=aotayt+...ta,t" with n € No and ao, @1,... Qn EK 


is a polynomial over K in the variable t. The set K [t] of all these polynomials forms a 
commutative ring with unit (cp. Example 3.17). Ifa, 4 0, then deg(p) = n is called 
the degree of p. If a, = 1, then p is called monic. If p = 0, then deg(p) := —~, 
and if deg(p) < 1, then p is called constant. 


Lemma 15.1 For two polynomials p,q € K(t] the following assertions hold: 


(1) deg(p + q) < max{deg(p), deg(q)}. 
(2) deg(p -q) = deg(p) + deg(q). 


Proof Exercise. Oo 


We now introduce some concepts associated with the division of polynomials. 
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Definition 15.2 Let K be a field. 


(1) If for two polynomials p,s € K[t] there exists a polynomial q € K[t] with 
p = 8s~-q, then s is called a divisor of p and we write s| p (read this as “s divides 
p). 

(2) Two polynomials p,s € K[t] are called coprime, if g|p and q|s for some 
q € K[t] always imply that q is constant. 

(3) A non-constant polynomial p € K[t] is called irreducible (over K), if p = s -q 
for two polynomials s,g € K[t] implies that s or q are constant. If there exist 
two non-constant polynomials s,g € K[t] with p = s - q, then p is called 
reducible (over K). 


Note that the property of irreducibility is only defined for polynomials of degree 
at least 1. A polynomial of degree 1 is always irreducible. Whether a polynomial of 
degree at least 2 is irreducible may depend on the underlying field. 


Example 15.3 The polynomial 2 — t? € Q[f] is irreducible, but the factorization 
2-1? = (V2—-1)-(V2+2) 


shows that 2 — t? € R[t] is reducible. The polynomial 1 + £? € R[t] is irreducible, 
but using the imaginary unit i we have 


1+? = (~i+t) (+t), 
so that 1 + £? € C[t] is reducible. 


The next result concerns the division with remainder of polynomials. 


Theorem 15.4 If p € K[t] ands € K[t] \ {0}, then there exist uniquely defined 
polynomials q,r € K[t] with 


p=s:q+r and deg(r) < deg(s). (15.1) 


Proof We show first the existence of polynomials q,r € K [t] such that (15.1) holds. 

If deg(s) = 0, then s = sọ for an sọ € K \ {0} and (15.1) follows with q := E -p 
and r := 0, where deg(r) < deg(s). 

We now assume that deg(s) > 1. If deg(p) < deg(s), then we set q := O and 
r := p. Then p = s -q +r withdeg(r) < deg(s). 

Let n := deg(p) > m := deg(s) > 1. We prove (15.1) by induction on n. If 
n = 1, then m = 1. Hence p = pı -t + po with pı Æ 0 and s = sı -t + so with 
Sı Æ 0. Therefore, 


p=s:q+r for G =p r i= po — P18; So, 


where deg(r) < deg(s). 
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Suppose that the assertion holds for an n > 1. Let two polynomials p and s with 
n + 1 = deg(p) > deg(s) = m be given, and let pa+ı (Æ 0) and Sm(Æ 0) be the 
highest coefficients of p and s. If 


h:= p— Paris, 8-0" e Kin, 
then deg(h) < deg(p) = n + 1. By the induction hypothesis there exist polynomials 
q,r € K[t] with 
h=s-q+r and deg(r) < deg(s). 
It then follows that 
p=s-qtr with q:=q + pma l 
where deg(r) < deg(s). 


It remains to show the uniqueness. Suppose that (15.1) holds and that there exist 
polynomials q,r € K[t] with p = s -q +r and deg(7r) < deg(s). Then 


r-T=s. (q-q). 
Ifr — r Æ 0, then g — q Æ 0 and thus 
deg(r — T) = deg(s - (q — q)) = deg(s) + deg(q — q) = deg(s). 
On the other hand, we also have 
deg(r — T) < max{deg(r), deg(r)} < deg(s). 


This is a contradiction, which shows that indeed r =F and q = g. o 


This theorem has some important consequences for the roots of polynomials. The 
first of these is known as the Theorem of Ruffini.! 


Corollary 15.5 Jf \ € K isa rootof p € K{t], i.e., p(X) = 0, then there exists a 
uniquely determined polynomial q € K[t] with p = (t — A) - q. 


Proof When we apply Theorem 15.4 to the polynomials p and s = t — A Æ 0, then 
we get uniquely determined polynomials q and r with deg(r) < deg(s) = 1 and 


p=(@t-—A)-q+r. 
The polynomial r is constant and evaluating it at \ gives 


0= pir) = (A—A)- gq) +r) =rQ), 


l Paolo Ruffini (1765-1822). 
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which yields r = 0 and p = (t — A) - q. o 


If a polynomial p € K[t] has at least degree 2 and a root A € K, then the linear 
factor t — A is a divisor of p and, in particular, p is reducible. The converse of this 
statement does not hold. For instance the polynomial 4—4r* +14 = (2—t?)-(2—t?) € 
Q[t] is reducible, but it does not have a root in Q. 

Corollary 15.5 motivates the following definition. 


Definition 15.6 If A €e K is a root of p € K[t] \ {0}, then its multiplicity is the 
uniquely determined nonnegative integer m, such that p = (t — A)” -q for a poly- 
nomial q € K[t] with g(A) Æ 0. 


Recursive application of Corollary 15.5 to a given polynomial p € K[t] leads to 
the following result. 


Corollary 15.7 Jf A, ..., A, E K are pairwise distinct roots of p € K[t] \ {0} with 
the corresponding multiplicities m,,..., mg, then there exists a unique polynomial 
q € K[t] with 

BACH) aC 7G) og 


and q(A;) # 0 for j = 1,...,k. In particular, the sum of the multiplicities of all 
pairwise distinct roots of p is at most deg(p). 


The next result is known as the Lemma of Bézout.’ 


Lemma 15.8 Jf p,s € K[t]\ {0} are coprime, then there exist polynomials q1, q2 € 
K[t] with 
p-ats-gme=l. 


Proof We may assume without loss of generality that deg(p) > deg(s) (> 0), and 
we proceed by induction on deg(s). 
If deg(s) = 0, then s = so for an sọ € K \ {0}, and thus 


P-ga+8:q.=1 with qı :=0, qa := 5)". 


Suppose that the assertion holds for all polynomials p,s € K[t] \ {0} with 
deg(s) = n for ann > 0. Let p,s € K[t] \ {0} with deg(p) > deg(s) = n + 1 be 
given. By Theorem 15.4 there exist polynomials q and r with 


p=s:q+r and deg(r) < deg(s). 
Here we have r Æ 0, since by assumption p and s are coprime. 
Suppose that there exists a non-constant polynomial h € K [t] that divides both 


s and r. Then h also divides p, in contradiction to the assumption that p and s are 
coprime. Thus, the polynomials s and r are coprime. Since deg(r) < deg(s), we can 


2Étienne Bézout (1730-1783). 
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apply the induction hypothesis to the polynomials s,r € K[t] \ {0}. Hence there 
exist polynomials q1, g2 € K [t] with 


ssa tr-eqjM=l. 
From r = p — s -q we then get 
læs qa t Snsd) ap asang) 
which completes the proof. Oo 


Using the Lemma of Bézout we can easily prove the following result. 


Lemma 15.9 Jf p € K[t] is irreducible and a divisor of the product s - h of two 
polynomials s,h € K{t], then p divides at least one of the factors, i.e., p|s or p|h. 


Proof If s = 0, then p|s, because every polynomial is a divisor of the zero polyno- 
mial. 

Ifs Æ Oand pis nota divisor of s, then p and s are coprime, since p is irreducible. 
By Lemma 15.8 there exist polynomials q1, q2 € K[t] with p -qı +s -q2 = 1, and 
hence 

h=h-l=(q-h)- p+q-(s-h). 


The polynomial p divides both terms on the right hand side, and thus also p|h. o 


By recursive application of Lemma 15.9 we obtain the Euclidean theorem, which 
describes a prime factor decomposition in the ring of polynomials. 


Theorem 15.10 Every polynomial p = ag + &ıt + ... + @&nt” € K[t] \ {0} has a 
unique (up to the ordering of the factors) decomposition 


P= fb: Pi’... Pk 


with u € K and monic irreducible polynomials p,,..., pgp € K[t]. 


Proof If deg(p) = 0, and thus p = ao, then the assertion holds with k = O and 
U = Qo. 

Let deg(p) > 1. If p is irreducible, then the assertion holds with py = u`! p 
and u = a,. If p is reducible, then p = pı - p2 for two non-constant polynomials 
pı and p2. These are either irreducible, or we can decompose them further. Every 
multiplicative decomposition of p that is obtained in this way has at most deg(p) = n 
non-constant factors. Suppose that 


p= p Piet PEG i en (15.2) 


for some k, £, where 1 < £ < k < n, u, € K, as well as monic irreducible 
polynomials pı, ..., Pk, q1, .--.,qe € K[t]. Then pı|p and hence p;|q; for some j. 
Since the polynomials p; and q; are irreducible, we must have pı = qj. 
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We may assume without loss of generality that j = 1 and cancel the polynomial 
pı = qı in the identity (15.2), which gives 


Pepi et De = p 235 O0 
Proceeding analogously for the polynomials p2,..., px, we finally obtain k = £, 
p=) and p — Gq; tor FS dek o 


15.2 The Fundamental Theorem of Algebra 


We have seen above that the existence of roots of a polynomial depends on the 
field over which it is considered. The field C is special in this sense, since here the 
Fundamental Theorem of Algebra? guarantees that every non-constant polynomial 
has a root. In order to use this theorem in our context, we first present an equivalent 
formulation in the language of Linear Algebra. 


Theorem 15.11 The following statements are equivalent: 


(1) Every non-constant polynomial p € C[t] has a root in C. 
(2) If V Æ {0} is a finite dimensional C-vector space, then every endomorphism 
f € L(V, V) has an eigenvector. 


Proof 


(1) = (2): If V Æ {0} and f € L(V, V), then the characteristic polynomial Py € 
C[t] is non-constant, since deg(P¢) = dim(V) > 0. Thus, Pp has a root in C, 
which is an eigenvalue of f, so that f indeed has an eigenvector. 

(2) > (1): Let p = ao + aıt + ... + ant” € C[t] be a non-constant polynomial 
with a, Æ 0. The roots of p are equal to the roots of the monic polynomial 
p := a7! p. Let A € C™” be the companion matrix of p, then P4 = P (cp. 
Lemma 8.4). 

If V is an n-dimensional C-vector space and B is an arbitrary basis of V, then 
there exists a uniquely determined f € L(V, V) with [f], = A (cp. Theo- 
rem 10.16). By assumption, f has an eigenvector and hence also an eigenvalue, 
so that p = P4 has a root. o 


The Fundamental Theorem of Algebra cannot be proven without tools from Analy- 
sis. In particular, one needs that polynomials are continuous. We will use the follow- 
ing standard result, which is based on the continuity of polynomials. 


Lemma 15.12 Every polynomial p € R{[t] with odd degree has a (real) root. 
3Numerous proofs of this important result exist. Carl Friedrich Gauß (1777-1855) alone gave four 


different proofs, starting with the one in his dissertation from 1799, which contained however a 
gap. The history of this result is described in detail in the book [Ebb91]. 
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Proof Let the highest coefficient of p be positive. Then 

lim p(t)=+00, lim p(t)=—o. 

t—> 0 t—>—00 


Since the real function p(t) is continuous, the Intermediate Value Theorem from 
Analysis implies the existence of a root of p. The argument in the case of a negative 
leading coefficient is analogous. Oo 


Our proof of the Fundamental Theorem of Algebra below follows the presentation 
in the article [Der03]. The proof is by induction on the dimension of VY. However, 
we do not use the usual consecutive order, 1.e., dim(V) = 1, 2,3,..., but an order 
that is based on the sets 


Mi? +L) 0sms7— 1, « odd) CN, j=l 2.3.24: 
For instance, 
Mı = {£ | £ odd} = {1,3,5,7,...}, Ms = Mı U {2, 6, 10, 14,...}. 


Lemma 15.13 


(1) If V is an R-vector space and if dim(V) is odd, i.e., dim(V) € Mı, then every 
f € L(Y, V) has an eigenvector. 

(2) Let K bea field and j € N. If for every K-vector space V with dim(V) € M; 
every f € L(V, V) has an eigenvector, then two commuting fi, fo € L(V, V) 
have a common eigenvector. That is, if fio fa = foo fi, then there exists a vector 
v € V \ {0} and two scalars A1, 42 E€ K with filv) = àv and f2(v) = Adv. 

(3) If V is an R-vector space and if dim(V) is odd, then two commuting fi, f2 € 
L(V, V) have a common eigenvector. 


Proof 


(1) For every f € L(V, V) the degree of Pr € R[t] is odd. Hence Lemma 15.12 
implies that Py has a root, and therefore f has an eigenvector. 

(2) We proceed by induction on dim (V), where dim (VY) runs through the elements of 
M; in increasing order. The set M; is a proper subset of N consisting of natural 
numbers that are not divisible by 2/ and, in particular, 1 is the smallest element 
of M j: 

If dim(V) = 1 € Mj, then by assumption two arbitrary fi, fo € L(V, V) each 
have an eigenvector, 1.e., 


filv) = Aivi, falv) = Ava. 


Since dim(V) = 1, we have vj = av for an a € K \ {0}. Thus, 


fo(v1) = fo(av2) = a fa (v) = A2(av2) = Azv, 
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i.e., vı is a common eigenvector of fı and fo. 

Let now dim(V) € M;, and let the assertion be proven for all K -vector spaces 
whose dimensions is an element of M; that is smaller than dim (V). Let fi, fo € 
L(V, V) with fio fo = f o fı. By assumption, fı has an eigenvector vı with 
corresponding eigenvalue A. Let 


Us im(A,Idy = fi), W i= VA (à1) = ker(A,Idy — fi). 


The subspaces U/ and W of VY are fi-invariant, i.e., fiU) C U and fiO) € W. 
For the space W we have shown this in Lemma 14.4 and for the space M this can 
be easily shown as well (cp. Exercise 14.1). The subspaces M4 and W are also 
fo-invariant: 

If u € U, then u = (AiIdy — fi)(v) fora v € V. Since fı and f commute, we 
have 


fhu) = (f o (Aldy — fi) (v) = (Aildy — fi) o fa) (vu) 
= (Aldy — fi)(fo(v)) EU. 


If w € W, then 


(Aldy — fi)(fo(w)) = (Ai Idy — fi) o hw) = (f o Aildy — fi) (w) 
= fo((Aildy — fi)(w)) = f2 (0) = 0, 


hence fa (w) € W. 

We have dim (V) = dim (U) + dim(W) and since dim(V) is not divisible by 27, 
either dim (U) or dim(W) is not divisible by 27. Hence either dim (U) € M j or 
dim (W) € M j 

If the corresponding subspace is a proper subspace of V, then its dimension is 
an element of M; that is smaller than dim(V). By the induction hypothesis then 
fı and f2 have a common eigenvector in this subspace. Thus, fı and f2 have a 
common eigenvector in V. 

If the corresponding subspace is equal to Y, then this must be the subspace W, 
since dim(W) > 1. Butif Y = W, then every vector in V \ {0} is an eigenvector 
of fı. By assumption also fz has an eigenvector, so that there exists at least one 
common eigenvector of fı and fo. 

By (1) it follows that the assumption of (2) holds for K = R and j = 1, which 
means that (3) holds as well. Oo 


(3 


—_ 


We will now prove the Fundamental Theorem of Algebra in the formulation (2) 
of Theorem 15.11. 


Theorem 15.14 If V Æ {0} is a finite dimensional C-vector space, then every f € 
L(V, V) has an eigenvector. 
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Proof We prove the assertion by induction on j = 1,2,3,... and dim(V) € M;. 

We start with j = 1 and thus by showing the assertion for all C-vector spaces of 
odd dimension. Let VY be an arbitrary C-vector space with n := dim(V) € Mı. Let 
f € L(V, V) and consider an arbitrary scalar product on Y (such a scalar product 
always exists; cp. Exercise 12.1), as well as the set of self-adjoint maps with respect 
to this scalar product, 


H := {g € LV, V) |g = 8”}. 


By Lemma 13.15 the set H forms an R-vector space of dimension n°. If we define 


hı, ha € L(H, H) by 


1 ad l ad 
E O EE ), M= S SERS] ) 


for all g € H, then hy o hz = hz o hı (cp. Exercise 15.8). Since n is odd, also n? is 
odd. By (3) in Lemma 15.13, h; and hz have a common eigenvector. Hence, there 
exists a g € H \ {0} with 


hi(g)=Aig, ha(g) = ùg forsome A1,A2 €R. 
We have (hı + th2)(g) = f o g for all g € H and therefore, in particular, 
fog = (hi +ih2)(8) = Ai +i). 
Since g Æ 0, there exists a v € V with 2 (v) Æ 0. Then 


f(g(v)) = Ai + 12) (8 0)), 


which shows that g(v) € V is an eigenvector of f, so that the proof for j = 1 is 
complete. 

Assume now that for some j > 1 and every C-vector space Y with dim(V) € M;, 
every f € L(Y, V) has an eigenvector. Then (2) in Lemma 15.13 implies that every 
two commuting fi, fo € L(V, V) have a common eigenvector. 

We have to show that for every C-vector space V with dim(V) € Mj+1, every 
f € L(Y, V) has an eigenvector. Since 


Mj+1 = M; U {2/q\q odd}, 


we only have to prove this for C-vector spaces V with n := dim(V) = 2/q for odd q. 
Let V be such a vector space and let f € L(V, V) be given. We choose an arbitrary 
basis of V and denote the matrix representation of f with respect to this basis by 
A e C™”. Let 


222 15 Polynomials and the Fundamental Theorem of Algebra 
S:={BeC""|B=B"} 
be the set of complex symmetric n x n matrices. If we define h1, hy € L(S, S) by 
h,(B):= AB + BA’, h)(B):= ABA’ 


for all B € S, then hı o hy = hz o hı (cp. Exercise 15.9). By Lemma 13.16 the set 
S forms a C-vector space of dimension n(n + 1)/2. We have n = 2/g for an odd 
natural number q. Thus, 


nn+1) Xq (2/q +1) 


; ; = 2/~"q- (24g +1) € Mj. 


By the induction hypothesis, the commuting endomorphisms hı and hy have a com- 
mon eigenvector. Hence there exists a B € S \ {0} with 


hi(B) = 1B, h>(B) = XB for some Àl, A2 EC. 


In particular, we have AiB = AB + BA’. Multiplying this equation from the left 
with A yields 


\,AB = A? B + ABA” = A? B + h,(B) = A? B + MB, 


so that B 
(A* — AMA +A) B =0. 


We now factorize t? — àit + à = (t — a) (t — 8) with 


ER E : Ay — fd? — 49 


Q = J. ’ 
2 2 


where we have used that every complex number has a square root. Then 
(A —al,)(A — 6L) B = 0. 


Since B Æ 0, there exists a v € C™! with Bv £0. If (A — GI,) Bu = 0, then Bv is 
an eigenvector of A corresponding to the eigenvalue 8. If (A — BI,)Bv Æ 0, then 
(A — B1,)B v is an eigenvector of A corresponding to the eigenvalue a. Since A has 
an eigenvector, also f has an eigenvector. Oo 
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MATLAB-Minute. 
Compute the eigenvalues of the matrix 


12345 

| DL BSS 
A=|23415|eR>° 

51423 

APB 5 


using the command eig(A). 

By definition a real matrix A can only have real eigenvalues. The reason for the 
occurrence of complex eigenvalues is that MATLAB interprets every matrix 
as a complex matrix. This means that within MATLAB every matrix can be 
unitarily triangulated, since every complex polynomial (of degree at least 1) 
decomposes into linear factors. 


As a direct corollary of the Fundamental Theorem of Algebra and (2) in 
Lemma 15.13 we have the following result. 


Corollary 15.15 Jf V Æ {0} is a finite dimensional C-vector space, then two com- 
muting fi, fo € L(V, V) have a common eigenvector. 


Example 15.16 The two complex 2 x 2 matrices 


il 2i 1 
a=]; i and B=|7 5 


commute. The eigenvalues of A are +1 + i and those of B are +2 + i. Hence A 
and B do not have a common eigenvalue, while [1, 1]’ and [—1, 1]’ are common 
eigenvectors of A and B. 


Using Corollary 15.15, Schur’s theorem (Corollary 14.20) can be generalized as 
follows. 


Theorem 15.17 fV Æ {0} is a finite dimensional unitary vector space and fi, f € 
L(V, V) commute, then fı and fə can be simultaneously unitarily triangulated, i.e., 
there exists an orthonormal basis B of V, such that | filg g and [f2]g g are both 
upper triangular. 


Proof Exercise. o 
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Exercises 


(In the following exercises K is an arbitrary field.) 


15,1. 
15:2: 


15.3. 


15.4. 


153: 


15.6. 


15.7. 


15.8. 


Prove Lemma 15.1. 
Show the following assertions for p1, p2, p3 € K[t]: 


(a) pil(pi p2). 

(b) pi|p2 and p2|p3 imply that pj|p3. 

(c) pilp2 and pı|p imply that pi|(p2 + p3). 

(d) If pi|p2 and p2|p;, then there exists ac € K \ {0} with py = cpo. 


Examine whether the following polynomials are irreducible: 


pp=eP—t?+t—1e Qt], pa=t—t?+t—1eRIt], 
po =t—t? +t—1e Cle), ps = 40° — 477 —t +1 € Qt], 
p3 = 4 — 4 —t+1€ Rie], po =P — 407 —t +1 € Clr]. 


Determine the decompositions into irreducible factors. 

Decompose the polynomials pı = t° — 2, p = t? + 2, p3 = tf — 1 and 
p4 = t? +t + 1 into irreducible factors over the fields K = Q, K = R and 
KC, 

Show the following assertions for p € K[t]: 


(a) If deg(p) = 1, then p is irreducible. 

(b) If deg(p) > 2 and p has a root, then p is not irreducible. 

(c) If deg(p) € {2, 3}, then p is irreducible if and only if p does not have a 
root. 


Let A € GL, (C), n > 2, and let adj(A) € C”” be the adjunct of A. Show 
that there exist n — 1 matrices A; € C™” with det(—A;) = det(A), j = 
1,...,n—1, and 


(Hint: Use P4 to construct a polynomial p € C[t]<,—; with adj(A) = p(A) 
and express p as product of linear factors.) 

Show that two polynomials p, q € C[t] \ {0} have a common root if and only 
if there exist polynomials r1, r2 € C[t] with O < deg(r1) < deg(p) such that 
0 < deg(r2) < deg(qg) and p -r2 +q -rı = 0. 

Let V be a finite dimensional unitary vector space, f € L(V, V), H = {g € 
L(V, V) | g = 9} and let 


l ad 
hy}: H->LY,V), gr a CR TE OT j; 
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15:9. 


15.10. 


15.11. 


13:12. 
15.13. 


15.14. 


15.15. 


15.16. 


15.17. 


l ad 
hy: H > L(Y, V), n OR Oy yi 


Show that h;, h2 € L(H, H) and h; o ha = ho o hi. 
Let A € C”, S = {B e C” | B = B" } and let 


hesa C BrAB+BA', 
hisas C D ABA. 


Show that hy, hz = L(S, S) and hı O h> = h> O hy. 

Let V be a C-vector space, f € L(V, V) and let U # {0} be a finite di- 
mensional f-invariant subspace of V. Show that U contains at least one 
eigenvector of f. 

Let V Æ {0} be a K-vector space and let f € L(V, V). Show the following 
Statements: 


(a) If K = C, then there exists an f-invariant subspace U of V with 
dim(U) = 1. 

(b) If K = R, then there exists an f-invariant subspace U of V with dim (U) € 
L2} 


Prove Theorem 15.17. 

Construct an example showing that the condition f o g = g o f in Theo- 
rem 15.17 is sufficient but not necessary for the simultaneous unitary trian- 
gulation of f and g. 

Let A € K”” be a diagonal matrix with pairwise distinct diagonal entries 
and B e K™” with AB = BA. Show that in this case B is a diagonal matrix. 
What can you say about B, when the diagonal entries of A are not all pairwise 
distinct? 

Show that the matrices 


—] 1 01 
s=[ TA} a[o] 
commute and determine a unitary matrix Q such that 0” AQ and O” BO 


are upper triangular. 
Show the following statements for p € K [t]: 


(a) For all A € K”” and S € GL,,(K) we have p(SAS~') = Sp(A)S7!. 

(b) For all A, B,C € K”” with AB = CA we have Ap(B) = p(C)A. 

(c) If K = Cand A e C”, then there exists a unitary matrix Q, such that 
Q” AQ and Q” p(A)Q are upper triangular. 


Let V be a finite dimensional unitary vector space. Let f € L(V, V) be 
normal, i.e., f satisfies f o ff = f of. 
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(a) Show that if A € C is an eigenvalue of f, then Vp (A)+ is an f-invariant 
subspace. 

(b) Show (using (a)) that f is diagonalizable. (Hint: Show by induction on 
dim(V), that V is the direct sum of the eigenspaces of f.) 

(c) Show (using(a) or (b)), that f is even unitarily diagonalizable, i.e., there 
exists an orthonormal basis B of Y such that [ f ]g,g is a diagonal matrix. 

(d) Let g € L(V, V) be unitarily diagonalizable. Show that g is normal. 
(This shows that an endomorphism on a finite dimensional unitary vector 
space is normal if and only if it is unitarily diagonalizable. We will give 
a different proof of this result in Theorem 18.2.) 


Let V be a finite dimensional K -vector space, f € L(V, V) and V =U, Ol, 
where U1, Uh are f-invariant subspaces of V. Let, furthermore, fj := flu, € 
LG .U)),.j =1,2. 


(a) For every v € V there exist unique u; € U and u2 E€ Uh with v = u +u2. 
Show that then also f(v) = f (u1) + f (u2) = filu) + fa (u2). 
(We write this as f = fı ® fo and call f the direct sum of fı and fo 
with respect to the decomposition V = U @ h2.) 

(b) Show that rank( f) = rank( f1) + rank( f2) and Pr = Pr, - Pp. 

(c) Show that a(A, f) =aQ, fi) + aQ, fo) forall A € K. 
(Here we set a(\, h) = 0, if À is not an eigenvalue of h € L(V, V).) 

(d) Show that (å, f) = gQ, fi) + 2Q, fo) forall A € K. 
(Here we set g(A, h) = dim(ker(AIdy —h)) even if A is not an eigenvalue 
of h € L(Y, V).) 

(e) Show that p(f) = p(fi) ® p(f2) for all p e K[t]. 


Chapter 16 
Cyclic Subspaces, Duality and the Jordan 
Canonical Form 


In this chapter we use the duality theory to analyze the properties of an endomorphism 
f ona finite dimensional vector space V in detail. We are particularly interested in the 
algebraic and geometric multiplicities of the eigenvalues of f and the characterization 
of the corresponding eigenspaces. Our strategy in this analysis is to decompose the 
vector space VY into a direct sum of f-invariant subspaces so that, with appropriately 
chosen bases, the essential properties of f will be obvious from its matrix represen- 
tation. The matrix representation that we derive is called the Jordan canonical form 
of f. Because of its great importance there have been many different derivations of 
this form using different mathematical tools. Our approach using duality theory is 
based on an article by Vlastimil Pták (1925-1999) from 1956 [Pta56]. 


16.1 Cyclic f-invariant Subspaces and Duality 


Let V be a finite dimensional K -vector space. If f € L(V, V) and vo € V \ {0}, then 
there exists a uniquely defined smallest number m € N, such that the vectors 


vo, f(vo), ---, f” (vo) 


are linearly independent and the vectors 


vo, f(vo), ..., f” (vo), f” (vo) 


are linearly dependent. Obviously m < dim(V), since at most dim(V) vectors of V 
can be linearly independent. The number m is called the grade of vo with respect to 
f. We denote this grade by m(f, vo). The vector vo = 0 is linearly dependent, and 
thus its grade is O (with respect to any f). 
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For vp Æ 0 we have m( f, vo) = 1 if and only if the vectors vp, f (vo) are linearly 
dependent. This holds if and only if vo is an eigenvector of f. If vọ Æ O is not an 
eigenvector of f, then m(f, vo) > 2. 

For every j € N we define the subspace 


K(f, vo) := span{vo, f (vo), ..., f2"(vo)} E V. 


The space KC; (f, vo) is called the jth Krylov subspace! of f and vo. 


Lemma 16.1 IfV is a finite dimensional K -vector space, f € L(V, V), and vo € V, 
then the following assertions hold: 


(1) Ifm=m(f, vo), then K(f, vo) is an f-invariant subspace of V, and 


span{vo} = Ki(f, vo) C Kalf, vo) C -+ C Kim (Ff, vo) = Km+j (Ff, vo) 


forall j € N. 

(2) Ifm = m(f, vo andU C V is an f -invariant subspace that contains the vector 
vo, then Km(f, vo) C U. Thus, among all f -invariant subspaces of V that contain 
the vector vo, the Krylov subspace Km(f, vo) is the one of smallest dimension. 

(3) If f”! (vo) 4 0 and f” (vo) = 0 for an m € N, then dim(X; (f, vo)) = j for 


T= bah 
Proof 


(1) Exercise. 

(2) The assertion is trivial if vp = 0. Thus, let vọ Æ O with m = d (f, vo) > 1 and 
let U C V be an f-invariant subspace that contains vp. Then M also contains 
the vectors f (vo), ..., f”! (vo), so that K,,(f, vo) C U and, in particular, 
dim(U) > m = dim(K,,(f, vo). 

(3) Let yo,..., Y%m-1 E K with 


0 = ovo +... + Ym-1 f”! (vo). 


If we apply f”~! to both sides, then 0 = yo f”~'(vo) and thus yọ = 0, since 
f™ (vo) Æ 0. If m > 1, then we apply inductively f”~* fork = 2,...,m and 
obtain yı = --: = Ym_1 = 0. Thus, the vectors vo, ..., f"! (vo) are linearly 
independent, which implies that dim (X; (f, vo)) = j for j = 1,...,m. o 


The vectors vo, f (vo), ..., f” !(vo) form, by construction, a basis of the Krylov 
subspace Km(f, vo). The application of f to a vector f*(vo) of this 
basis yields the next basis vector f k+1 (vo), k = 0, 1,...,m — 2, and the application 
of f to the last vector f”~!(vg) yields a linear combination of all basis vectors, since 
f” (vo) € K(f, vo). Due to this special structure, the subspace Km (f, vo) is called 
a cyclic f -invariant subspace. 


' Aleksey Nikolaevich Krylov (1863-1945). 
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Definition 16.2 Let V Æ {0} be a K -vector space. An endomorphism f € L(V, V) 
is called nilpotent, if f” = 0 holds for an m € N. If at the same time f”~! Æ 0, 
then f is called nilpotent of index m. 


The zero map f = O is the only nilpotent endomorphism of index m = 1. If 
V = {0}, then the zero map is the only endomorphism on Y. This map is nilpotent 
of index m = 1, where in this case we omit the requirement f”~! = f° £0. 

If f is nilpotent of index m and v Æ 0 is any vector with f”~!(v) Æ 0, then 
f(f™ DY) = f” (v) = 0 = 0 f”!(v). Hence f”~!(v) is an eigenvector of f 
corresponding to the eigenvalue 0. Our construction in Sect. 16.2 will show that O is 
the only eigenvalue of a nilpotent endomorphism (also cp. Exercise 8.3). 


Lemma 16.3 If V 4 {0} is a K-vector space and if f € L(V, V) is nilpotent of 
index m, then m < dim(V). 


Proof If f is nilpotent of index m, then there exists a vo € V with f”! (vo) 4 0 
and f” (vo) = 0. By (3) in Lemma 16.1 the m vectors vo, ..., f”~!(vo) are linearly 
independent, which implies that m < dim (V). o 


Example 16.4 In the vector space K>! the endomorphism 


V1 0 
FK SK h | =e In|, 
V3 ly 


is nilpotent of index 3, since f £0, f? Æ 0 and f? = 0. 


If U is an f-invariant subspace of V, then f |u € L(U, U), where 
flu:u>u, ut fl), 


is the restriction of f to the subspace M (cp. Definition 2.12). 


Theorem 16.5 Let V be a finite dimensional K -vector space and f € L(V, V). 
Then there exist f -invariant subspaces Ui C V and C V with V = U, h, such 
that flu, E LU, U) is bijective and f |u, € LU2, U2) is nilpotent. 


Proof If v € ker(f), then f7(v) = f(f(v)) = f(O) = 0. Thus, v € ker( f?) and 
therefore ker( f) C ker( f7). Proceeding inductively we see that 


{0} C ker(f) € ker(f*) C ker(f*) C---. 


Since V is finite dimensional, there exists a smallest number m € No with ker( f”) = 
ker(f*/) for all j € N. For this number m let 


Ui := im( f”), Uh := ker( f”). 
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(If f is bijective, then m = 0, UU, = V and Uh = {0}.) We now show that the spaces 
U and Uh satisfy the assertion. 

First observe that 0/; and U are both f-invariant: If v € U1, then v = f” (w) for 
some w € V, and therefore f(v) = f(f"(w)) = f"(f(w)) € U. If v € Uh, then 
f"(f)) = fCf" (v)) = f(O) = 0, and therefore f(v) € Uh. 

We have Ui + U2 C V. An application of the dimension formula for linear maps 
(cp. Theorem 10.9) to f” gives dim(V) = dim (U1) + dim (2). If v € Uy N Uh, then 
v = f” (w) for some w € VY (since v € U) and hence 


0a w= aS w: 


The first equation holds since v € U2. By the definition of m we have ker( f”) = 
ker( f?”), whichimplies f” (w) = 0, and therefore v = f” (w) = 0. From U NU = 
{0} we obtain V = U ® h. 

Let now v € ker(f|u,) C U be given. Since v € U, there exists a vector w € V 
with v = f”(w), which implies 0 = f(v) = f(f”(w)) = f”t'(w). By the 
definition of m we have ker(f™) = ker(f”*'), thus w € ker(f™), and therefore 
v = f”(w) = 0. This implies that ker(f|z,) = {0}, 1.e., flu, is injective and thus 
also bijective (cp. Corollary 10.11). 

If, on the other hand, v € Wh, then by definition 0 = f” (v) = (flu )” (v), and 
thus (f |u)” is the zero map in L(Y, U2), so that flu, is nilpotent. o 


For the further development we recall some terms and results from Chap. 11. Let 
Y be a finite dimensional K -vector space and let V* be the dual space of V. IFU C V 
and W C V* are two subspaces and if the bilinear form 


B: UXW =K, (vh)r hiv), (16.1) 


is non-degenerate, then U, W is called a dual pair with respect to 8. This requires that 
dim(U) = dim(W). For f € L(U, U) the dual map f* € L(U*, U*) is defined by 


f* : UW —> U“, hme hof. 


For all v € U and h € U* we have (f*(h))(v) = h(f(v)). Furthermore, (f*)* = 
(f*)* for all k € No. The set 


U? := {h € V*|h(u) =0 forallu € U} 


is called the annihilator of U. This set is a subspace of V* (cp. Exercise 11.5). 
Analogously, the set 


W? := {fv e VI hw) =0 forall h € W} 


is called the annihilator of W. This set is a subspace of V. 
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Lemma 16.6 Let V be a finite dimensional K -vector space, f € L(V, V), V* the 

dual space of V, f* € L(V*, Y*) the dual map of f, and let U C V and W C Y* be 

two subspaces. Then the following assertions hold: 

(1) dim (V) = dim(W) + dimOV”) = dim (U) + dim U®). 

(2) If f is nilpotent of index m, then f* is nilpotent of index m. 

(3) fW C V* is an f*-invariant subspace, then W? C V is an f-invariant sub- 
space. 

(4) FU, W are a dual pair with respect to the bilinear form defined in (16.1), then 
V =U wW.. 


Proof 


(1) Exercise. 
(2) For all v € V we have f” (v) = 0 and hence, 


0 = h( f” 0) = (FY AE) = FA) 


for every h € V* and v € V, so that f* is nilpotent of index at most m. 
If (f*)”-! = 0, then (f*)""'(h) = 0 for all h € V*, and therefore 0 = 
(FOTIA) w) = h(f™'(v)) for all v € V. This implies that f”~! = 0, 
in contradiction to the assumption that f is nilpotent of index m. Thus, f* is 
nilpotent of index m. 

(3) Let w € W°. For every h € W, we have f*(h) € W, and thus 0 = f*(h)(w) = 
h(f(w)). Hence f (w) € W°. 

(4) Ifu € U N W°, then h(u) = 0 for all h € W, since u € W°. Since U, W is 
a dual pair with respect to the bilinear form defined in (16.1), we have u = 0. 
Moreover, dim (U) = dim (W) and using (1) we obtain 


dim(V) = dim(W) + dim(W”) = dim (U) + dim(V’). 


From U N W? = {0} we obtain V = U W°. o 


Example 16.7 We consider the vector space V = R*! with the canonical basis 
B = {e1, e2}. For the subspaces 


U = span {|| GY, 
W = {h ey” | [A]s n = [a, a] for an a € R} Cr, 
we have 


U’ = {h € V* | [hen = La, 0] foran a € R} c yV", 


y'=som{[_!]] cv 
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In this example, we easily see that dim(V) = dim(W) + dim(W°) = dim (U) + 
dim(/°), and that U, W form a dual pair with respect to the bilinear form defined in 
(16.1) with K = R. Moreover, Y =U @ W°. 


The following theorem presents, for a given nilpotent f, a decomposition of V 
into f-invariant subspaces. The idea of the decomposition is to construct a dual pair 
of subspaces U C V and W C Y*, where U is f-invariant and W is f*-invariant. 
By (3) in Lemma 16.6 then W° is f-invariant and with (4) in Lemma 16.6 it follows 
that V = U BW". 


Theorem 16.8 Let V be a finite dimensional K -vector space and let f € L(V, V) 
be nilpotent of index m. Let vo € V satisfy f7! (vo) Æ 0 and let hy € V* satisfy 
ho( f”'(vo)) # 0. 

Then m(f, vo) = m(f*, ho) = m, and the f- and f*-invariant subspaces Km( f, vo) 
C V and Kin( f*, ho) C V*, respectively, are a dual pair with respect to the bilinear 
form defined in (16.1). Furthermore, 


V = Km(f, vo) B Km(f*, ho), 


where (Km(f*, ho))° is an f-invariant subspace of V. 


Proof Let vọ € V be a vector with f”~!(vo) Æ 0. Since f™(vọ) = 0, the space 
Km(f, vo) is an m-dimensional f-invariant subspace of V (cp. (3) in Lemma 16.1). 
Let ho € V* be a vector with 


0 Æ ho( f™ |(vo)) = ((f*)”7! (ho)) (vo). 


Then, in particular, 0 Æ (f*)”7! (ho) € L(V*, V*). Since f is nilpotent of index m, 
also f* is nilpotent of index m (cp. (2) in Lemma 16.6), so that 


(f°)" (ho) = 0 € LO", V"). 


Therefore, K,,(f*, ho) is an m-dimensional f*-invariant subspace of V* (cp. (3) in 
Lemma 16.1). 
It remains to show that K,,(f, vo), Kin (f*, ho) are a dual pair. Let 


m—l1 


v1 = > 770) € Ka (f, vo) 


j=0 


be a vector with h (v1) = (v1, h) = 0 for all h € Km(f*, ho). We show inductively 
that then yo = --- = Ym-1 = 0, and thus vı = 0. 
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Using (f*)”~! (ho) € Kin (f*, ho) our assumption on the vector vı yields 


m—1 
0 = (FOTE (Ho) 1) = ho f" Wd) = >, yho dT (v0) 
j=0 
= oho(f"*(v0)). 
The last equation holds, since f”~!*/(vg) = O for j = 1,...,m — 1 (because 
f™ = 0). From ho(f™ !(vo)) Æ 0 we obtain yo = 0. 
Suppose now that y = --- = y-1 = Oforak, 1 < k < m — 2. Using 
(f*)"—!-* (ho) € Kin(f*, ho) our assumption on the vector vı yields 
m—1 
SG E (Ao) (v1) = ho fF") S yiho( fF * (v0) 
j=0 


= ykho( f”! (v0)). 


The last equation holds, since y; = 0 for j = 0,...,k — 1 and f™s-*(y9) = 0 
for j=k+1,...,m— 1. 

We have v; = O as asserted, and therefore the bilinear form defined in (16.1) 
for the spaces Kin (f, vo), Kin(f*, ho) is non-degenerate in the first variable. Anal- 
ogously, the bilinear form is non-degenerate in the second variable, and hence 
Km(f, vo), Kin( f*, ho) are a dual pair. 

Using (4) in Lemma 16.6 we now have V = Km (f, vo) ® (Kin (f*, ho))?, where 
the space (Km(f*, ho))?, is by (3) in Lemma 16.6 an f-invariant subspace of V. O 


16.2 The Jordan Canonical Form 


Let V be a finite dimensional K -vector space and f € L(V, V). If there exists 
a basis B of VY consisting of eigenvectors of f, then [f]g,g 1s a diagonal matrix, 
1e., f is diagonalizable. A necessary and sufficient condition for this is that the 
characteristic polynomial Py decomposes into linear factors over K and that in 
addition g( f, Aj) = a(f, Aj) for every eigenvalue A; (cp. Theorem 14.14). 

If Ps decomposes into linear factors but g(f,A;) < a(f,Aj;) holds for at 
least one eigenvalue Aj, then f is not diagonalizable but can still be triangulated, 
i.e., there exists a basis B of V, such that [f]g 2 is an upper triangular matrix 
(cp. Theorem 14.17). From this triangular matrix we can read off the algebraic, but 
usually not the geometric multiplicities of the eigenvalues. The goal of the following 
construction is to determine a basis B of V, so that [ f ]g,g is upper triangular and in 
addition to the algebraic also reveals the geometric multiplicities of the eigenvalues. 

Under the assumption that Ps decomposes into linear factors over K, we will 
construct a basis B of VY for which [f ]g,g is a block diagonal matrix of the form 
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Ja (A1) 
flee = ie, 
where each diagonal block has the form 
Aj 1 
Jap := pa E Kt (16.2) 
oy 
Xj 
for some A; € K andd; € N, j = 1,...,m. A matrix of the form (16.2) is called a 


Jordan block of size d; corresponding to the eigenvalue Aj. 

In the following construction we first do not assume that P; decomposes into 
linear factors. We only assume the existence of a single eigenvalue A; € K of f. 
Using this eigenvalue, we define the endomorphism 


g:= 7 — AjIdy € L(Y, V). 
By Theorem 16.5 there exist g-invariant subspaces U C V and W C Y with 
y =u W, 


such that 
pi := glu 


is nilpotent and gly is bijective. Then U # {0}, since otherwise W = V and 
glw = gly = g would be bijective, which contradicts the assumption that A; is an 
eigenvalue of f. 

Let gı be nilpotent of index dı. Then by construction 1 < dı < dim(Y/). Let 
w, € U bea vector with g”! (wy) Æ 0. Since a (w1) = 0, the vector g”! (wy) is 
a eigenvector of gı corresponding to the eigenvalue 0. By (3) in Lemma 16.1, the dı 
vectors 

Wi plUs g~ (w) 


are linearly independent and U4; := Ka, (g1, w1) is a dı-dimensional g;-invariant 


subspace of U. 
Consider the basis 


Bı := fei w, sayli Wi); wi} 
of Ui. Then the matrix representation gı|u, with respect to the basis B, is given by 


[eilu lB, B, = Ja (0) € Kae, 
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This shows, in particular, that the characteristic polynomial of g1 |u, is given by the 
monomial f“', and hence 0 is the only eigenvalue of g1 |u, . Moreover, by construction 
[eila le, g = l8 l ]B,B. 

Ifd; = dim (U), then our construction is complete for the moment. If, on the other 
hand, dı < dim (U), then applying Theorem 16.8 to gı € L(U, U) shows that there 
exists a gı -invariant subspace U Æ {0} with U = U, ® U, and we consider 


82 (= 8i lg. 


This map is nilpotent of index d2, where 1 < d < dı. We now carry out the same 
construction as before: 

We determine a vector w2 € U with gs | (w2) Æ 0. Then go | (wa) is an 
eigenvector of g2, Uz := Ka,(g2, w2) is a d2-dimensional g-invariant subspace of 
U c U and for the basis 


B2 := [82 (w), saeg Dol); wz} 


of U we have 
[g2lele.e = Ja (0) E€ KEP, 


where again [e2|u lB, B, = lelu ]B2,B, by construction. 


After k < dim (U) steps this procedure terminates. We then have found a decom- 
position of U of the form 


U = Ka (81, W1) ®... D Ka, (ge, Wk) = Ka (8, w1) ®... Ka, (8, we). 


In the second equation we have used that Ka, (gj, wj) = Ka; (2. wp for] = lpk 
If we combine the constructed bases B4, ..., B; to a basis B of U, then 
[elu ]B,,B: Ja (0) 
[eluls,8 = E = ta 
[8 lun] By, By Ja, (0) 
Thus, the nilpotent endomorphism gı = gļ|u has the characteristic polynomial 


pitt and its only eigenvalue is 0. 
We now transfer these results to 


f=g+Ai1ldy. 
Every g-invariant subspace is f-invariant and one observes easily that 


Ka (f, wi) =Ka(g,wj), j=1,...,k 
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(cp. Exercise 16.3). Hence, it follows that 
U=Ka(f,wi) ®...BKa (Cf, wz). 
For every j =1,...,k and O < £ < d; — 1 we have 
f (w) = g (8 (wp) + Arg (w) = àg (w) g w), (16.3) 


where g% (w j) = 0. The matrix representation of f |u with respect to the basis B of 
U is therefore given by 


LF lu ]B,,B; Ja (Ar) 
[fluls.s = f - i a64 
LF il BiB, Ja (à) 


The map glw = flw — A1Idy is bijective by construction, i.e., A; is not an 
eigenvalue of f|w. Therefore, a(f, A1) = dim(U) = dı +... + d. In order to 
determine g (f, A;), letv € U be an arbitrary vector. Then there exist scalars œ; € K 
with 


k dj—1 
v= > 2 agw) 
j=1 €=0 
Using (16.3) we obtain 
k dj=l k d;—1 k dj—1 
M= È ajef (sep) =D DY yeg wD D apes wp 
j=l €=0 j=l ¢=0 JESU 


k dj=2 
=w +>, 2, ajeg wj) 
j=l £4=0 


The vectors in the last sum are linearly independent. Hence, f(v) = Av if and 
only if œ; = 0 for j = 1,...,k and £ = 0, 1,...,d; — 2. This shows that every 
eigenvector of f corresponding to the eigenvalue A; has the form 


k 
d;—1 
v= > aye! "wp, 
j=l 


where at least one a; is nonzero, so that we have 


VeA) = span{g"! (wy), ..., g% (uy)}. 
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Since g”! (w1), ..., %7! (wp) are linearly independent, it follows that g (f, A1) = 

k. The geometric multiplicity of the eigenvalue A; therefore is equal to the number of 

Jordan blocks corresponding to the eigenvalue A; in the matrix representation (16.4). 

Furthermore, we observe that in every subspace Ka, (f, wj), the endomorphism f 

has exactly one (linear independent) eigenvector corresponding to the eigenvalue Aj. 
We summarize these results in the following theorem. 


Theorem 16.9 Let V be a finite dimensional K -vector space and let f € L(V, V). 
IfA; € K is an eigenvalue of f, then the following assertions hold: 


(1) There exist f -invariant subspaces {0} £U C VandW C V with V = U ẹ W. 
The map fu — Aldy is nilpotent and the map f|w — Aildw is bijective. In 
particular, A; is not an eigenvalue of f |w. 

(2) The subspace U from (1) can be written as 


U=Ka(f,wi) De DKW) 


for some vectors w1,..., We E U, where Ka (J: w;) is a dj-dimensional f- 
invariant subspace of V, j = 1,...,k. This is called a cyclic decomposition 
of U. 


(3) There exists a basis B of U with 


Ja, (A1) 
[flule.e = on 
Jg (A1) 


(4) We have a( f, A1) = di + ... + d and g(f, A1) =k. 


If f has a further eigenvalue Ay Æ A1, then it is an eigenvalue of the restriction 
flw € LOW, W) and we can apply Theorem 16.9 to flw. The vector space W 
then is the direct sum of the form W = ¥ @ V, where f| x — Az2Idx is nilpotent and 
fly — àzldy is bijective. The space ¥ has a cyclic decomposition analogous to (2) 
in Theorem 16.9, and there exists a matrix representation of f |æ analogous to (3). 

This construction can be carried out for all eigenvalues of f. If the characteristic 
polynomial P; decomposes into linear factors over K , then we finally obtain a cyclic 
decomposition of the entire space VY, which gives the following theorem. 


Theorem 16.10 Let V be a finite dimensional K -vector space and let f € L(V, V). 
If the characteristic polynomial Pz decomposes into linear factors over K, then there 
exists a basis B of V, such that 


Ja (A1) 
[flee = , (16.5) 


Ja, (Am) 


where \i,..., Am E K are the (not necessarily pairwise distinct) eigenvalues of f. 
For every eigenvalue A; of f then a( f, A;) is equal to the sum of the sizes of all 
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Jordan blocks corresponding to A; in (16.5), and g(f, Aj) is equal to the number 
of Jordan blocks corresponding to A; in (16.5). A matrix representation of the form 
(16.5) is called a Jordan canonical form? of f. 


From Theorem 14.14 we know that f € L(V, V) is diagonalizable if and only 
if Ps decomposes into linear factors over K and g(f,A;) = a(f, Aj) holds for 
every eigenvalue A; of f. If Py decomposes into linear factors, then the Jordan 
canonical form (16.5) shows that g(f, Aj) = a( f, A;) if and only if every Jordan 
block corresponding to A; is of size 1. 

The Fundamental Theorem of Algebra yields the following corollary of Theo- 
rem 16.10. 


Corollary 16.11 /fV is a finite dimensional C-vector space, then every f € L(V, V) 
has a Jordan canonical form. 


The following uniqueness result justifies the name canonical form. 


Theorem 16.12 Let V be a finite dimensional K -vector space. If f € L(V, V) has 
a Jordan canonical form, then it is unique up to the order of the Jordan blocks on 
the diagonal. 


Proof Let dim(V) = n and let B,, B2 be two bases of V with 


Ja (v1) 
Ai = [f] = eK. 


dion) 


as well as 
Je, (u1) 
z= [lp = i e ik", 


Joie) 
For a given eigenvalue \;, 1 < j < m, we define 
rO (A) ‘= rank ((Aı — Ahd) , 0 dZ 


Then 

dO OAD 1 A) —rOA), s=1,2,..., 
is equal to the number of Jordan blocks J;(\;) € K © on the diagonal of A; with 
£ > s. The number of Jordan blocks corresponding to the eigenvalue A; with exact 
size s therefore is given by 


1 
dP (Aj) = dy 


(A;) =r A) — 27 ;) +r Aj) (16.6) 


S— s+ 1 


*Marie Ennemond Camille Jordan (1838—1922) derived this form 1870. Two years earlier, Karl 
WeierstraB (1815—1897) proved a result that implies the Jordan canonical form. 
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(cp. Example 16.13). 
The matrices A; and A> are similar and, therefore, have the same eigenvalues, 
ico 


Airesa Ans = iee fee 


Furthermore, 
rank ((Aı — ala) ) = rank ((A2 — ala) ) 


forall œ € K and m e No. 
In particular, for every A; there exists u; € {u1, ..., yg} with u; = A; and for 
this u; and the matrix A> we get 


r” (u) ‘= rank ((A2 — Hila) ) = rs), § =0, 1, 2er 


Now (16.6) shows that the matrix A> has, up to reordering, the same Jordan blocks 
on the diagonal as the matrix A}. Oo 


Example 16.13 This example illustrates the construction in the proof of Theo- 
rem 16.12. If 


11 
Jy(1) 1 
A= Ji) = ek, (16.7) 
Jy(Q) 01 
then (A — 1. I5)° = 15. 
01 00 
0 0 
A=1- = , (A-1-15)? = 
i 71 em) 
=| 1 


and we get 
rod) = 5, m0) =3, rsd) =2, 5 > 2, 


di(1)=2, d(1)=1, 4d0)=0, s 23, 


qQ)-Mm)=1, h0)-BQ)=1, ds) —ds410) = 0, 5 > 3. 
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We now consider the powers of a Jordan block Jg(A) € K did Since I, and J,(0) 
commute, 


k 


k NA 
JaADE = (Ala + Ja) = > (") NOME >= = ) HOY, 


j=0 j=0 





for every k € No, where p”? is the jth derivative of the polynomial p = t% with 
respect to ź, 


p® = (56 =t, pM =(P =k(k-1)...(k-j+D t, j=1,...,k. 
We can now easily show the following result. 


Lemma 16.14 If p € K[t] is a polynomial of degree k > Q, then 


E aA 
p OA) = >= 5 l HOV. (16.8) 


j=0 





Proof Exercise. o 


Considered as a linear map from K%! to K@', the matrix Ja (0) represents an 
“upshift”, since 


QO] 2 1 
Q : Q 
HOI =|: | fral |7 |e Kt. 
: Qd : 
Qd 0 Qd 


Clearly, 
(Ja(0))’ £0, €=0,1,...,d—1, (Jyg(0))4 =O, 


and hence the linear map Ja (0) is nilpotent of index d. The sum on the right hand 
side of (16.8) therefore has at most d terms, even when deg(p) > d. 

Moreover, the right hand side of (16.8) shows that p (Ja (A)) is an upper triangular 
matrix with constant entries on its diagonals. A matrix with constant diagonals is 
called a Toeplitz matrix.’ In particular, on the main diagonal we have the entry p(A). 
From (16.8) we see that p(Jg(A)) = 0 holds if and only if 


PQ) = pA) = ++ = p PO) =0. 


Thus we have shown the following result. 


3Otto Toeplitz (1881-1940). 
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Lemma 16.15 Let p € K[t] be a polynomial and Jg(X) € KË! be a Jordan block. 


(1) The matrix p(Jqa(A)) is invertible if and only if X is not a root of p. 
(2) We have p(Ja(X\)) = 0 € K@4 if and only if À is a d-fold root of p, i.e., if the 
linear factor (t — A)! is a divisor of p. 


Let V be a finite dimensional K-vector space and let f € L(V, V), where we 
do not assume that Py decomposes into linear factors. From the Cayley-Hamilton 
theorem (Theorem 8.6) we know that Ps (f) = 0 € L(Y, V), i.e., there exists a monic 
polynomial of degree at most dim(V), which annihilates the endomorphism f. Let 
pı, p2 € K[t] be two monic polynomials of smallest possible degree with pı (f) = 
P2(f) = 0. Then (pı — p2)( f) = 0, and since pı and p2 are monic, pı — p2 € K[t] 1s 
a polynomial with deg(p; — p2) < deg(p;) = deg(p2). The minimality assumption 
on deg(p;) and deg(p2) implies that py — p2 = O, 1.e., pi = p2. Thus, for every 
f € L(Y, V) there exists a uniquely determined monic polynomial of minimal degree 
which annihilates f. This justifies the following definition. 


Definition 16.16 If V is finite dimensional K -vector space and f € L(V, V), then 
the uniquely determined monic polynomial of minimal degree that annihilates f is 
called the minimal polynomial of f. We denote this polynomial by M ¢. 


By construction we always have deg(M +) < deg(P;) = dim(V). 


Lemma 16.17 Jf V is a finite dimensional K -vector space and f € L(V, V), then 
the minimal polynomial M , divides every polynomial that annihilates f and is, in 
particular, a divisor of the characteristic polynomial Py. 


Proof For p = 0 we have p(f) = O and M, divides p. If p e K[t] \ {0} is a 
polynomial with p( f) = 0, then deg(M f) < deg(p). Using division with remainder 
(cp. Theorem 15.4), there exist uniquely determined polynomials q,r € K[t] with 
p=q-M,;-+randdeg(r) < deg(M +). Thus, 


0 = pf) =a PMs (f) +r (Ff) =r(f). 
The minimality of deg(M;) implies that r = 0, and hence M p divides p. Oo 


If Ps decomposes into linear factors, then we can explicitly construct My using 
the Jordan canonical form of f. 


Lemma 16.18 Let V be a finite dimensional K -vector space. If f € L(V, V) has a 
Jordan canonical form with pairwise distinct eigenvalues 1, ..., A, andifd,, ..., dk 
are the respective maximal sizes of the corresponding Jordan blocks, then 


k 
Mr = | ]@-A/*. 


j=l 
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Proof We know from Lemma 16.17 that M ș is a divisor of Ps. Therefore, 


k 
Mr = [|e = dj)" 


j=l 
for some exponents £1, ..., €x. If 


Ja (à1) 
Ja, (Am) 


is a Jordan canonical form of f, then M;(f) = 0 € L(V, V) is equivalent to 
M;(A) = 0 €e K””, where n = dim(V). We have M;(A) = O if and only if 
Mf (Ja,(Aj)) = O for j = 1,...,m. For this it is necessary and sufficient that 
M s (J3, (A;)) = 0 for j = 1,...,k. By Lemma 16.15 this holds if and only if every 


of the linear factors (t — r D”, j= 1,...,k,is a divisor of M +. Therefore, My has 
the desired form. o 


Example 16.19 If f is an endomorphism with the Jordan canonical form A in (16.7), 
then 
Pp=(t—1pt?, M;=(t-17t 


00 12 
0 1 
M;(A) = (A-1: 5) A? = 


which shows that Ms (A) = 0 € R*> and M;(f) =0€ LY, V). 


The Jordan canonical form is of great importance in theoretical Linear Algebra. 
In practical applications, however, where usually matrices over K = Ror K = C 
are considered, it is not so relevant, since there is no numerically stable method for 
computing the Jordan canonical form of a general matrix in finite precision arithmetic. 
The reason for the lack of such a method is that the entries of the Jordan canonical 
form do not depend continuously on the entries of the given matrix. 


Example 16.20 Consider the matrix 


o= lao EER. 
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For every given £ Æ 0, the matrix A (e€) has the two distinct eigenvalues £ and 0, and 
hence the diagonal matrix 
E 0 
J(e) = b o 


is a Jordan canonical form of A (€). However, for € —> 0, we obtain 


01 00 
A(é) > bar J(e) > bea 


Thus, J (€) does not converge to the Jordan canonical form of A (0) for £ — 0. 

A similar example is given by the matrices in Exercise 8.5: While A(Q) is a 
Jordan block of size n corresponding to the eigenvalue 1, for every € Æ O we obtain 
a diagonalizable matrix A (e) € C”” with n pairwise distinct eigenvalues. 


MATLAB-Minute. 
et 


A=T"! F EE 4 
tl 

where T € C>? is a random matrix constructed with the command T= 
rand (2). Construct several such matrices and always compute the eigenvalues 
using the command eig(A). Display the eigenvalues in format long. 

One observes that the two eigenvalues are real or complex conjugates, and that 
they always have an error starting from the 8th digit after the decimal point, 
i.e., an error on the order of 1078. This does not happen by chance, but is 
due to the behavior of the eigenvalues under perturbations, which arise from 
rounding errors in the computer. 


16.3 Computation of the Jordan Canonical Form 


We now derive a method for the computation of the Jordan canonical form of an 
endomorphism f on a finite dimensional K-vector space V. We assume that Pr 
decomposes into linear factors over K, and that the roots of P+, i.e., the eigenvalues 
of f, are known. The construction follows the important steps in the existence proof 
of the Jordan canonical form in Sect. 16.2. 

Suppose that À is an eigenvalue of f and that f has a corresponding Jordan block of 
size s. Then there exist s linearly independent vectors t),..., ts with [f]g f = Js (A) 
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for B = {t;,...,t,}. With to := O and writing Id instead of Idy for simplicity of 
notation, we then have 


(f — Ald) (41) = 6, 
(f — Ald) (t2) = t, 


(f — Ald) (ts) = ts-1, 


hence t,—; = (f — Ald) (t,) for j =O) 1,..., s. 
The vectors ts, ts—1, ..., t1 form a sequence as the one we have constructed in the 
context of the Krylov subspaces, and 


span{t,,ts_1,...,t:} = K(f — Ald, t). 


The reverse sequence 
li, bh2,..., És 


is called a Jordan chain of f corresponding to the eigenvalue A. The vector tı is an 
eigenvector of f corresponding to À. For the vector t we then have (f — AId)(t2) Æ 0 
and 


(f — Ald)*(h) = (f — Md) (t) = 0. 
Hence h € ker((f — AId)*) \ ker( f — Ald), and in general 
t; € ker((f — Ald)/) \ ker(( f — Ald) iT, j=1,...,8. 


This motivates the following definition. 


Definition 16.21 Let V be a finite dimensional K -vector space, let f € L(V, V) 
have the eigenvalue A € K, and let k € N. A vector v € V with 


v € ker((f — Ald)*) \ ker((f — Ald)*~!) 


is called a principal vector of level k of f corresponding to the eigenvalue A. 


Principal vectors of level one are eigenvectors. Principal vectors of higher levels 
can be considered generalizations of eigenvectors, and they are therefore sometimes 
called generalized eigenvectors. 

For the computation of the Jordan canonical form of f, we thus need to know the 
number and lengths of the Jordan chains corresponding to the different eigenvalues 
of f. These correspond to the number and sizes of the Jordan blocks of f. If F isa 
matrix representation of f with respect to an arbitrary basis, then (cp. the proof of 
Theorem 16.12) 
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d,(A) :=rank((F — \J)*~') — rank((F — AI)*) 
= dim(im((f — AId)*~')) — dim(Gim(( f — Ald)*)) 
= dim(V) — dim(ker((f — Md)y*)) — (dim(V) — dim(ker((f — AId)*))) 
= dim(ker((f — AId)*)) — dim(ker((f — AId)*~!)) 

is the number of Jordan blocks corresponding to A of size at least s. This implies, in 


particular, that 
d; (A) > ds41(A) = 0, eo ie 


and d;(A) — d;+1 (A) is the number of Jordan blocks of exact size s corresponding 
to à. There exists a smallest number m €e N with 


{0} = ker((f — NId)°) C ker((f — Ady) Cece ker((f — Ald)”) = ker((f — Md"). 


Hence d,(A) = 0 for all s > m + 1, so that there is no Jordan block corresponding 
to A of size m + 1 or larger. 
In order to compute the Jordan canonical form, we therefore proceed as follows: 


(1) Determine the eigenvalues of f. 
(2) For every eigenvalue À of f carry out the following steps: 


(a) Determine the smallest number m € N with 
ker((f —Ald)®) C ker(( f—Ald)!) C --- C ker((f —Ald)”) = ker((f —Md)"*!), 


Then dim(ker((f — AId)”)) = a(A, f). 
(b) Fors = 1,...,m determine 


d,(A) = dim(ker((f — AId)*)) — dim(ker((f — Adar >o. 
Ifs >m + 1, then d, (A) = 0, and 
dı (à) = dim(ker( f — AId)) = g (å, f) 


is the number of Jordan blocks corresponding to A. 
(c) To simplify notation, we write d, := d, (A) and determine the Jordan chains 
as follows: 
(i) Since dm — dm+1 = dm, there exist d Jordan blocks of size m. For each 
of these blocks we determine a Jordan chain of dm principal vectors of 
level m, 1.e., vectors 


fim misses lnm E Kerf —Ald)”) V kerf =A 
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Gi) 
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with the following prone 
If aj,...,Qa¢, E K with E citi € ker((f — Ald)”~!), then ay = 


>= O04, = 0. Here the frst index in ¢;,; indicates the number of the 
chain, and the second indicates the level of the principal vector (from 
ker((f — Ald)/) and not ker((f — AId)/~!)). 
For j =m,m—1,...,2 we proceed as follows: 
When we have determined d; principal vectors of level j, say 1, ;, fo, ;, 
.,tg,,j, we apply f — Ald to each of these vectors, hence 


Pea = =I). Loar a2, 


in order to determine the principal vectors of level j — 1. 
dj 

H Oies Qd, € K with > Oil j- © ker((f — AId) i73), then 
i=l 


d; dj 
0 = (F -A |S otia | =f — Ald)! Y ait; h, 


d; 
and thus J` a;t; j € ker((f — Ald)/~') giving ay =--- = aq, = 0. 


i=l 
If d;_; > dj, then there exist d; — dj—ı Jordan blocks of size j — 1. For 
these we need the Jordan chains of length j — 1. Thus we extend the 
already computed 


tij- b,j- -+> tq, j-1 € ker((f — Ald)/~') \ ker((f — Ald)/~*) 
to dj_, principal vectors of level (j — 1) (but only if dj;_; > dj) via 
th j—1,t2,j-15 +++, ta; y,j-1 € ker((f — Md)/~") \ ker((f — Ald)/~7), 
dj- 
where the following must hold: Ifa ),...,ag,_, E€ K with >) aiti j-1 € 


i=l 


ker((f — Ald)/~7), then ai =--- = aq 


After completing the step for j = 2, we have obtained (linearly independent) 
vectors t1 1, f2,1,---,ta,.1 € ker(f — Ald). Since dim(ker(f — AId)) = dj, 
we have found a basis of ker( f — AId). In this way we have determined dı 
different Jordan chains that we combine as follows: 


1h:= {t11, 1105.26 og Boys Ws 19x ecea as aa lg ly exes ices 
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Each chain begins with an eigenvector, followed by principal vectors of 
increasing levels. Here we use the convention that the chains are ordered 
decreasingly according to their length. 


(3) Jordan chains are linearly independent, if their first vectors (the eigenvectors) 
are linearly independent. (Show this as an exercise.) Thus, if A;,..., Ag are the 
pairwise distinct eigenvalues of f, then 


VÈN eead 
is a basis, for which [f ]r,r is in Jordan canonical form. 


Example 16.22 We interpret the matrix 


50100 
01000 

F= | -10300 | e R5 
00010 
00004 


as endomorphism on R>!. 


(1) The eigenvalues of F are the roots of Pr = (t — 1)°(t — 4°. In particular Pr 
decomposes into linear factors and F has a Jordan canonical form. 
(2) We now consider the different eigenvalues of F: 


(a) For the eigenvalue A; = 1 we obtain 


40100 
00000 
ker(F — I) = ker —10200 = span{eo, ey}. 
00000 
00003 


Here dim(ker(F — J)) = 2 = a(1, F). 
For the eigenvalue Az = 4 we obtain 


1 0 1 
0-3 0 
ker(F — 4 I) = ker —] 0—1 OO = span{e, — e3, es}, 
0 0 0 
0 0 0 
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00000 
09000 
ker((F — 41)*) = ker 00000 = span{e1, €3, es}. 
00090 
00000 


Here dim (ker ((F — 4 D3) = 3 = a (4, F). 
(b) For Ay = 1 we have dı (1) = dim(ker(F — J)) = 2. 
For Ao = 4 we have dı(4) = dim(ker(F — 4D) = 2 and da (4) = 
dim(ker((F — 47)*)) — dim(ker(F -4 D) = 3 —2 = 1. 
(c) Computation of the Jordan chains: 
e For A; = 1 we have m = 1. As principal vectors of level one we choose 
ti 1 = e2 and ft, = e4. These form a basis of ker(F — I): If œa, a2 € R 
with a,éo + ave, = 0, then a; = ao = O. For à; = 1 we are finished. 
e For Ay = 4 we have m = 2, and we choose a principal vector of level 
two, say f},2 = e1. For this vector we have: If a; € R with aje € 
span{e; — e3, es}, then a; = 0. We compute 


fj i= (F — 4 Itiz = ĉĉ] — 63. 


Since dı (4) = 2 > 1 = d(A4), we have to add to f;,; another principal 
vector of level one, and we choose t2 1 = e5. Since the vectors are linearly 
independent, &œ1tı 1 + Q2te, E ker((F — 4 T)°) = {0} implies that a; = 
Q2 = 0. 

In this way we get 


00 110 
10 000 
T, =|00ļ| and %, =| -100 
01 000 
00 001 


(3) The coordinate transformation matrix is T = [T), T} ], and the Jordan canonical 
form of F is 


1 01 000 
00 010 

—T 'FT, where T™!= | 00-100 

10 100 


00 001 
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Exercises 


(In the following exercises K is an arbitrary field.) 


16.1. 
16:2. 
16.3. 


16.4. 


16.5. 
16.6. 


16.7. 


16.8. 


16.9. 


Prove Lemma 16.1 (1). 

Prove Lemma 16.6 (1). 

Let V be a K-vector space, f € L(V, V) and A € K. Prove or disprove: A 
subspace U C V is f-invariant, if it is (f — AIdy)-invariant. 

Let V be a finite dimensional K-vector space, f € L(V, V), v € V and 
A € K. Show that K;(f, v) = K;(f — Aldy, v) for all j € N. Conclude 
that the grade of v with respect to f is equal to the grade of v with respect to 
f — Aldy. 

Prove Lemma 16.14. 

Let V be a finite dimensional Euclidean or unitary vector space and let f € 
L(V, V) be selfadjoint and nilpotent. Show that then f = 0. 

Let V # {0} be a finite dimensional K -vector space, let f € L(V, V) be 
nilpotent of index m and suppose that Ps decomposes into linear factors. 
Show the following assertions: 


(a) Pr = t” with n = dim(V). 

(b) My =t". 

(c) There exists a vector v € VY of grade m with respect f. 

(d) For every \ € K we have Mf-yay = (t + A)”. 

Let V be a finite dimensional K -vector space and f € L(V, V). Show the 
following assertions: 


(a) ker(f/) C ker(f/*!) for all j > O and there exists an m > O with 
ker(f”) = ker(f”*'). For this m we have ker( f”) = ker( f”*) for all 
Jol 

(b) im(f/) D im(fİ*t!) for all j > O and there exists an 2 > O with 
im(f°) = im(f‘*'). For this £ we have im(f*) = im(f*’) for all 
J-L 

(c) If m, > 0 are minimal with ker(f”) = ker(f”*') and im( f} = 
im(f‘t!), then m = £. 

(Theorem 16.5 now implies that V = ker( f”) © im(f”) is a decompo- 
sition of VY into f-invariant subspaces.) 

Let V be a finite dimensional K-vector space and let f € L(V, V) be a 

projection (cp. Exercise 13.10). Show the following assertions: 

(a) v € im(f) implies that f(v) = v. 

(b) V=im(f) @ ker(f). 

(c) There exists a basis B of Y with 


[flee = " 0 H ; 


where k = dim(im(f)) andn = dim(V). In particular, Ps = (t — D~ 
and À € {0, 1} for every eigenvalue A of f. 


250 


16.10. 


16.11. 


16.12. 


16.13. 


16.14. 


16.15. 


16.16. 


16.17. 
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(d) The map g = Idy — f is a projection with ker(g) = im( f) and im(g) = 
ker(f). 


Let V be a finite dimensional K -vector space and let U, W C V be two 
subspaces with V = U ® W. Show that there exists a uniquely determined 
projection f € L(V, V) with im( f) = U and ker( f) = W. 

Determine the Jordan canonical form of the matrices 


Lae 6 2 10 00 
cane. D =i 11 00 
A= ce R44, B=|-1 03 00|ceR” 
a 03.43 
5 es =—|=—10 Ti 
ee We es E 


using the method presented in Sect. 16.3. Determine also the minimal poly- 
nomial. 

Determine the Jordan canonical form and the minimal polynomial of the 
linear map 


T: C<3lt] —> C<3lt], Qo + Qt + ant? + (iat > GQ) + Qot + ant. 


Determine (up to the order of blocks) all matrices J in Jordan canonical form 
with P; = (t+ 1 (t — 1) and My = (t + 1) (t — 1)”. 

Let V Æ {0} be a finite dimensional K -vector space, f € L(V, V), and sup- 
pose that Ps decomposes into linear factors. Show the following assertions: 


(a) Pr = M, holds if and only if g(\, f) = 1 for all eigenvalues A of f. 

(b) f is diagonalizable if and only if My has only simple roots, i.e., roots 
with multiplicity one. 

(c) A root of A € K of Mp is simple if and only if ker(f — Aldy) = 
ker((f — Aldy)?). 


Let V be a K-vector space of dimension 2 or 3 and let f € L(V, V) with Pr 
decomposing into linear factors. Show that the Jordan canonical form of f 
is uniquely determined by Ps and My. Why does this not hold any longer if 
dim(V) > 4? 

Let A € K”™” bea matrix for which the characteristic polynomial decomposes 
into linear factors. Show that there exists a diagonalizable matrix D and a 
nilpotent matrix N with A = D+ N and DN = ND. 

Let A € K™” be a matrix that has a Jordan canonical form. We define 


l ° 
ies = [n] — ae ’ JEA) = , 7 ek; 


1 
vA 1 
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16.18. 


Show the following assertions: 


ORSA THETA 

(b) A and A? are similar. 

O) AJETI OA: 

(d) A can be written as a product of two symmetric matrices. 


Determine for the matrix 


511 
A=1|1051]| eR” 
004 


two symmetric matrices S4, S> € R? with A = SS). 
y 


Chapter 17 
Matrix Functions and Systems 
of Differential Equations 


In this chapter we give an introduction to the area of matrix functions. We first define 
general matrix functions and derive their most important properties. Using the exam- 
ples of network analysis and chemical reactions, we illustrate how matrix functions 
arise naturally in applications. The network analysis example involves the exponen- 
tial function of matrices, and we study the properties of this important function in 
detail. The analysis of chemical reaction kinetics leads to a system of ordinary differ- 
ential equations, whose solution again is based on the matrix exponential function. 


17.1 Matrix Functions and the Matrix Exponential 
Function 


In the following we will study functions that yield for a given n x n matrix again an 
n x n matrix. A possible definition of such a function is given by the entrywise 
application of scalar functions to the matrix. For instance, one could define for 
A= [ai; | € C”” the function sin(A) by sin(A) := [sin(a;;)]. However, such a 
definition is not compatible with the matrix multiplication, since in general already 
A2 +£ ja |. 

The following definition of the primary matrix function from [Hig08, Defini- 
tion 1.1—1.2] will turn out to be consistent with the matrix multiplication. Since 
this definition is based on the Jordan canonical form, we assume for simplicity that 
A e C””. Our considerations also apply to square matrices over R, as long as they 
have a Jordan canonical form. 


Definition 17.1 Let A € C™” have the Jordan canonical form 


J = diag(Jz,(1),..., Ja, m)) = SAS, 


m 
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and let Q C C be such that {A,,..., Am} C Q. A function f : Q — Cis said to be 
defined on the spectrum of A, if the values 


fP(A;) for i=1,...,m and j =0,1...,d,-1 (17.1) 


exist. Here fH (A;), j = 1,...,d; — 1, is the jth derivative of the function f(A) 
with respect to A evaluated at A;. If A; € R, then this is the real derivative, and for 
A; € C\ R it is the complex derivative. Moreover, we assume that equal eigenvalues 
that occur in different Jordan blocks are mapped to the same values in (17.1). 

If f is defined on the spectrum of A then the primary matrix function f (A) 1s 
defined by 


f(A) = Sf(J)S* where f(J) := diag( f Ja AD), «++, Fa, Am) (17.2) 





and 
FO) FO) SY ... Bae 
f (Ja i) = eo o | fori =1y...,m. (173) 
fi) 
fi) 


Note that for the definition of f(A) in (17.2)-(17.3) only the existence of the 
values in (17.1) is required. 


Example 17.2 Let A = h € C>? and let f(z) = Jz (the square root function). 
If we set f(1) = V1 = +1, then f(A) = VA = h by Definition 17.1. If we 
choose the other branch of the square root function, 1.e., f(1) = V1 = —1, then 
f(A) = VA = —h. The matrices I, and — h are primary square roots of A = h. 
Taking different branches of a function for different Jordan blocks corresponding to 
the same eigenvalue is incompatible with Definition 17.1. For instance, the matrices 


1 O —10 
x=] and %=| +4 


are incompatible with Definition 17.1, despite the fact that X? = h and X2 = h. 


All solutions X € C”” of the matrix equation X? = A are called square roots of 
the matrix A € C””. But as Example 17.2 shows, some of these may not be primary 
square roots according to Definition 17.1. In the following, by f(A) we will always 
mean a primary matrix function according to Definition 17.1, and will usually omit 
the term “primary”. 

In (16.8) we have shown that for each polynomial p € C[t] of degree k > 0 we 
have 
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k 


KAGON S 


j=0 


pj) 


_— (Ja,(0))’. (17.4) 





A simple comparison shows that this formula agrees with (17.3) for f = p. This 
means that the computation of p(Ja,(A;)) with (17.4) leads to the same result as the 
definition of p(Ja,(A;)) by (17.3). More generally, the following result holds. 


Lemma 17.3 Let A € C”” and p = agt? +... + ait + ao € C[t]. Then (17.2)- 
(17.3) with f = p yields a matrix function f (A) that satisfies f(A) = a, A% +...+ 
a, A + aol. 


Proof Exercise. Oo 


If we consider, in particular, the polynomial f = t? in (17.2)-(17.3), then the 
resulting f(A) is equal to the product A x A. This shows that the definition of the 
primary matrix function f(A) is consistent with the matrix multiplication. 

The following theorem, which is of great practical and theoretical importance, 
shows that the matrix f(A) can always be written as a polynomial in A. 


Theorem 17.4 Let A € C”” have the minimal polynomial M4, and let f (A) be as 
in Definition 17.1. Then there exists a uniquely determined polynomial p € C[t] of 
degree at most deg(M,) — 1 with f(A) = p(A). In particular, Af (A) = f(A)A, 
f(A") = f(A)! as wellas f(VAV~') = Vf(A)V~! forall V € GL, (©). 


Proof We will not present the proof here since it requires advanced results from 
interpolation theory. Details can be found in [Hig08, Chap. 1]. Oo 


Using Theorem 17.4 we can show that the primary matrix function f(A) in 
Definition 17.1 is independent of the choice of the Jordan canonical form of A. We 
already know from Theorem 16.12, that the Jordan canonical form of A is unique 
up to the order of the Jordan blocks. If 


J = diag(Jz,(1),..., Ja, Am)) = SAS, 


m 


J = diag(Jz Ou), -.-, Jz Om)) = S'AS 


are two Jordan canonical forms of A, then J = P'JP fora permutation matrix 
P e R™”, where the matrices J and J are the same up to the order of diagonal 
blocks. Hence 


f (J) = diag(f (Ja (AD), <- f (Jan Am))) 
= P (P'diag( Ga Oesa d Sag Am)))P) PT 
= P (diag( f (Jz,Q1)), ---, £ Jz, Am)))) P 
= Pf(J)P’. 
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Theorem 17.4 applied to the matrix J yields the existence of a polynomial p with 
f(J) = p(J). Thus, we get 


f(A) = SFS = SPDS = p(A) = pS IS) = SP p(J)PS™ = SP! f(J) PS"! 
= SAS. 
Let us now consider the exponential function f(z) = e? that is infinitely often 
complex differentiable throughout C. In particular, e* is defined (in the sense of 
Definition 17.1) on the spectrum of every given matrix 


A= Sdias(J7 OG), crdh nS E C. 


If t € C is arbitrary (but fixed), then the derivatives of the function e’* with respect 
to the variable z are given by 


We will use the notation exp(M) instead of e” for the exponential function of a matrix 
M. For every Jordan block Ja (A) of A we then have, by (17.3) with f(z) = e%, 


2! (d—1)! 
l : d—1 1 
exp(tJy(A)) = e^ oe g |= eS — Ja}, (17.5) 
: : J PEA k! 
t 
1 


and the matrix exponential function exp(tA) is given by 

exp(tA) = Sdiag(exp(tJg, (A1)), ..., Exp Ja, Ours: (17.6) 
The parameter t will be used in the next section in the context of linear differential 
equations. 


In Analysis it is shown that for every z € C the function e* can be represented by 
the absolutely convergent series 


Using this series and the equation (J,(0))* = 0 for all £ > d, we obtain 
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d—l1 (oe) CO 
dy! 1 
exp(tJu()) =e” (tJa(0))! = > (Xf es) 
(= 9 ! -E é=0 
tea l 
-> (x ; a zeuo) 
j=0 l 
Sr fey = ' 
-54 (XC) van 
j=0 ° \e=0 
= D - (Ala + Ja(0))! 
jo d 
al 
=a CO): (17.7) 
= 


J 


In this derivation we have used the absolute convergence of the exponential series 
and the finiteness of the series with the matrix Ja (0). This allows the application of 
the Cauchy product formula! for absolutely convergent series, which is also proven 
in Analysis. 


Lemma 17.5 If A € C””, t € C and exp(tA) is the matrix exponential function in 
(17.5)-(17.6), then 


CoO 


expt A) = > (tay! 


j=0 


Proof In (17.7) we have shown this already for Jordan blocks. The assertion then 


follows from 


1 , 1 , 
S asJIS Y =S | > ay | so! 
==] — 
j=0 j=0 
and the representation (17.6) of the matrix exponential function. Oo 


We immediately see from Lemma 17.5 that for a matrix A € R™” and every real 
t the matrix exponential function exp(tA) is a real matrix. 

The following result presents further important properties of the matrix exponen- 
tial function. 


Lemma 17.6 If the two matrices A,B e C”” commute, then exp(A + B) = 
exp(A) exp(B). For every matrix A € C™” we have exp(A) € GL,(C) with 
(exp(A))~! = exp(—A). 


l Augustin Louis Cauchy (1789-1857). 
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Proof If A and B commute, then the Cauchy product formula yields 





ak ot ol aay EAE E 
0) OD = | 27 (Èa )- (Ea Gao ) 


j=0 


£=0 
Sf lw) S a | | 

->(GX()) a” ‘\= > aay 
j=0 J: (= j J° 

= exp(A + B) 


Here we have used the binomial formula for commuting matrices (cp. Exercise 4.10). 
Since A and —A commute, we have 


OO 1 
exp(A) exp(—A) = exp(A — A) = exp(0) = Š —0/ = hy, 
J: 
=0 
and hence exp(A) € GL,,(C) with (exp(A))~! = exp(—A). o 


For non-commuting matrices the statements in Lemma 17.6 in general do not hold 
(cp. Exercise 17.9). 


MATLAB-Minute. 
Compute the matrix exponential function exp(A) for the matrix 


E R55 


aes 
| 
A 


-1 3 
—2 4 
i 
gh 
J 


bonos 
EE ET 


using the command E1=expm(A). (Look at help expm.) 

Also compute the diagonalization of A using the command [S,D]=eig(A), 
and form the matrix exponential function exp(A) as E2=Sxexpm(D)/S. 
Compare the matrices E1 and E2 and compute the relative error norm(E1- 
E2)/norm(E2). (Look at help norm.) 


Example 17.7 Let A = [ajj] € C”” be a symmetric matrix with aj; = 0 and aj; € 
{O, 1} for alli, 7 = 1,...,m. We identify the matrix A with a graph Ga = (V4, Ea) 
consisting of a set of n vertices V4 = {1,...,n} anda set of edges E, C Va x Va. 
Fori = 1,...,n the row i of A is identified with the vertex i € E4, and every entry 
aj; = 1 is identified with an edge (i, j) € E4. Due to the symmetry of A, we have 
aj; = 1 if and only if aj; = 1. We therefore consider in the following the elements 
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of E4 as unordered pairs, i.e., (i, j) = (jJ, i). The following example illustrates this 


identification: 
01110 


10011 
A=]ļ|10001 
11000 
01100 


is identified with G4 = (V4, Ea), where 
Ez =]11,2. 343k Vra=401,2). C133). 0.4). 2.4). C.D); Gh 


and the graph G 4 can be displayed as follows: 


© O 

A path of length m from the vertex kı to the vertex km+ı is an ordered list of 
vertices kı, k2, ...,km+1, where (ki, ki+1) € Va fori = 1,...,m. If ki = km4+1, 
then this is a closed path of length m. In the above example, paths from 1 to 4 are 
given by 1,2,4 and 1, 2, 5,3, 1,2, 4; these have the lengths 2 and 6, respectively. 
In the mathematical field of Graph Theory one usually assumes that the vertices in 
a path are pairwise distinct. Our deviation from this convention is motivated by the 
following interpretation of a matrix A and its powers: 

An entry a;; = 1 in the matrix A means that there exists a path of length 1 from 
vertex i to vertex j, 1.e., the vertices i and j are adjacent. If a;; = 0, then no such 
path exists. The matrix A is therefore called the adjacency matrix of the graph G 4. 
If we square the adjacency matrix, then the entry in the (i, j) position is given by 


(A’);; = > Ajeag;- 


f= 


In the sum on the right hand side, we obtain for a given £ a 1 if and only if (i, 2) € E4 
and (£, j) € Ea. The sum on the right had side therefore is equal to the number of 
vertices that are adjacent to both i and j. Hence the (i, j) entry of A? is equal to the 
number of pairwise distinct paths from i to j (i Æ j), or the pairwise distinct closed 
paths from i toi of length 2 in G 4. More generally, one can show the following (cp. 
Exercise 17.10): 

Let A = [ajj] € C™” be asymmetric adjacency matrix, i.e., A = A! witha;; = 0 
and ai; € {0, 1} for alli, j = 1,...,n, and let G4 be the graph identified with A. 
Then for eachm € N the (i, j) entry of A” is equal to the number of pairwise distinct 
paths from i to j (i Æ Jj) or the pairwise distinct closed paths from i to i of length 
min G4. 
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For the above matrix A we obtain 


31012 26541 
13210 62145 
A> =|02210] and A=151024 
11121 442i 
20012 15420 


The 3 pairwise distinct closed paths of length 2 from 1 to 1 are 
1,2,1, 1,3,1, 1,4,1 
and the 4 pairwise distinct paths of length 3 from 1 to 4 are 
1,2,1,4, 1,3,1,4, 1,4,1,4, 1,4,2,4. 


Numerous real world applications involve networks that can be modeled mathe- 
matically using graphs. Examples include social, biological, telecommunication or 
airline networks. The properties of such networks are studied in the interdisciplinary 
area of Network Science. An important task is to identify participants in the network 
that are central in the sense that their functionality has a significant impact on the 
entire network. If the network has been modeled by a graph, then we can study the 
centrality of the vertices. For example, a vertex can be considered central if it is con- 
nected to a large part of the graph via many short closed paths. Longer connections 
are usually less important, and thus paths should be scaled down according to their 
length. If we use the scaling factor 1/m! for a path of length m, then for the vertex i 
in the graph G4 with the adjacency matrix A we obtain a centrality measure of the 
form 


A ae we 


The relative ordering of the vertices according to this formula is not changed when 


we add the constant 1. We then obtain the centrality of the vertex i as 


1 1 
(1H Ad sate Tate...) = OPAD 


Another important quantity is the so-called communicability between the vertices i 
and j fori Æ j, which is given by the weighted sum of the pairwise distinct paths 
from i to j, 1.e., by 


l l 
. ij 


For the above matrix A the MATLAB function expm yields 
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3.7630 3.1953 2.2500 2.7927 1.8176 
3.1953 3.7630 1.8176 2.7927 2.2500 
exp(A) = | 2.2500 1.8176 2.4881 1.2749 1.9204 
2.7927 2.7927 1.2749 2.8907 1.2749 
1.8176 2.2500 1.9204 1.2749 2.4881 


The vertices 1 and 2 have the largest centrality, followed by 4, 3 and 5. If we would 
define the centrality of a vertex as the number of adjacent vertices, then in this example 
we could not distinguish between the vertices 3, 4 and 5. The largest communicability 
in this example exists between the vertices 1 and 2. 

Further information concerning the analysis of networks using adjacency matrices 
and matrix functions can be found in the article [EstH10]. 
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A differential equation describes a relationship between a desired function and its 
derivatives. Such equations are used in all areas of science and engineering for 
modeling physical phenomena. Ordinary differential equations involve a function of 
one variable and its derivatives, while partial differential equations involve functions 
of several variables and their partial derivatives. In this section we focus on ordinary 
differential equations of first order, 1.e., those in which only the function and its first 
derivative occur. 

A simple example for the modeling with ordinary differential equations of first 
order is the increase or decrease of a biological population, such as bacteria in a petri 
dish. Let y = y(t) be the size of the population at time t. If there is enough food 
and if the external conditions (e.g. temperature or pressure) are constant, then the 
population grows with a (real) rate k > O, that is proportional to the current number 
of individuals. This can be described by the equation 


yi= — y = ky. 17.8 
Pa E (17.8) 
Clearly, one can also take k < 0, and then the population shrinks. 
We are then looking for a function y : D C R — R that satisfies (17.8). The 
general solution of (17.8) is given by the exponential function 


y=ce", 


where c € R is an arbitrary constant. For a unique solution of (17.8) we need to 
know the size of the population at a given initial time tọ. In this way we obtain the 
initial value problem 


y = ky, y(to) = yo, 
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which, as we will show below, is solved uniquely by the function 


y — elt —to)k 


yo- 
Example 17.8 In a chemical reaction certain initial substances (called educts or 
reactants) are transformed into other substances (called products). Reactions can be 
distinguished concerning their order. Here we only discuss reactions of first order, 
where the reaction rate is determined by only one educt. In reactions of second and 
higher order one typically obtains nonlinear differential equations, which are beyond 
our focus in this chapter. 

If, for example, the educt A, is transformed into the product A with the rate 
—k, < 0, then we write this reaction symbolically as 


ky 
Ay — > Ad, 


and we model it mathematically by the ordinary differential equation 


yi ==kiyi 


Here the value yı(t) is the concentration of the substance Aj, at time t. For the 
concentration of the product Az, which grows with the rate kj > 0, we have the 
corresponding equation y2 = kıyı. 

It may happen that a reaction of first order develops in both directions. If A 
transforms into A> with the rate —k,, and A> transforms into A, with the rate —k>, 
1.€., 

kı 
A Añ; 


— 
k2 


then we can model this reaction mathematically by the system of linear ordinary 
differential equations 


yı = —ki yı + kyr, 
y2 = kıyı — ka y2. 


Combining the functions yı and yz in a vector valued function y = [y;, y2]’, we 
can write this system as 


-N _ | ki k 
y = Ay, where a=] k mat 
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The derivative of the function y(t) is always considered entrywise, 


Reactions can also have several steps. For example, a reaction of the form 


ky ka k4 
Ag FA A 
k3 
leads to the differential equations 
yı = —kı yı, 


y2 = kıyı — ka y2 + k3 y3, 
y3 = kayo — (k3 + k4)y3, 
y4 = kays, 


and thus to the system 


—k; 0 0 0 
kı —ko k3 0 
O ky —(k3 + k4) O 
0O 0 k4 0 


y= Ay, where A= 


The sum of the entries in each column of A is equal to zero, since for every decrease 
in a substance with a certain rate other substances increase with the same rate. 

In summary, a chemical reaction of first order leads to a system of linear ordinary 
differential equations of first order that can be written as y = Ay with a (real) square 
matrix A. 


We now derive the general theory for systems of linear (real or complex) ordinary 
differential equations of first order of the form 


ýy = Ay +g, te[0,a]. (17.9) 


Here A € K™” is a given matrix, a is a given positive real number, g : [0, a] > K”! 
is a given function, y : [0, a] —> K”! is the desired solution, and we assume that 
K = Ror K = C. If e(t) = 0 € K™! forall t € [0, a], then the system (17.9) is 
called homogeneous, otherwise it is called non-homogeneous. For a given system of 
the form (17.9), the system 

y = Ay, te[0,a], (17.10) 


is called the associated homogeneous system. 
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Lemma 17.9 The solutions of the homogeneous system (17.10) form a subspace of 
the (infinite dimensional) K -vector space of the continuously differentiable functions 
from the interval [0, a] to K™!. 


Proof We will show the required properties according to Lemma 9.5. The function 
w = (is continuously differentiable on [0, a] and solves the homogeneous system 
(17.10). Thus, the solution set of this system is not empty. If 


W1, W2 : [0, a] => Kk 


are continuously differentiable solutions and if a;, a2 E€ K, then w = a,;w; + anu? 
is continuously differentiable on [0, a], and 


w= QW + QW = œj Aw] + Q AW = Aw, 


1.e., the function w is a solution of the homogeneous system. Oo 


The following characterization of the solutions of the non-homogeneous system 
(17.9) is analogous to the characterization of the solution set of a non-homogeneous 
linear system of equations in Lemma 6.2 (also cp. (8) in Lemma 10.7 ). 


Lemma 17.10 Jf w: : [0,a] — K”! is a solution of the non-homogeneous system 
(17.9), then every other solution y can be written as y = w, + w, where w is a 
solution of the associated homogeneous system (17.10). 


Proof If w, and y are solutions of (17.9), then y — wy = (Ay + g) — (Au, + g) = 
A(y — w1). The difference wz := y — w, thus is a solution of the associated homo- 
geneous system and y = w; + w2. Oo 


In order to describe the solutions of systems of ordinary differential equations, we 
consider for a given matrix A € K™” the matrix exponential function exp(tA) from 
Lemma 17.5 or (17.5)—(17.6), where we now consider t € [0, a] as real variable. The 
power series of the matrix exponential function in Lemma 17.5 converges, and it can 
be differentiated termwise with respect to the variable t, where again the derivative 
of a matrix with respect to the variable t is considered entrywise. This yields 


á (tA) a TAs Aya: | Ay? i TA J 
— ex = — = = Pas 
aoe dt 2 6 
1 
SARIA P NA Pue 
= Aexp(tA). 


The same result is obtained by the entrywise differentiation of the matrix exp(t A) in 
(17.5)—(17.6) with respect to t. With 
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M(t) := ea, 


we obtain 


d d, 
y PCA) = z (eM (t)) 


= Ne M(t) +e? M(t) 

= Ne M(t) + e^ Ja (0) M(t) 
= (Ala + Ju(0)) eM (t) 

= Jy(A) exp(t Ja(A)), 


which also gives £ exp(tA) = A exp(t A). 
Theorem 17.11 


(1) The unique solution of the homogeneous differential equation system (17.10) 
for a given initial condition y(0) = yọ € K™! is given by the function y = 
exp(t A)yo. 

(2) The set of all solutions of the homogeneous differential equation system (17.10) 
forms an n-dimensional K-vector space with the basis {exp(tA)e,..., 
exp(t A)en}. 


Proof 
(1) If y = exp(tA) yo, then 


d d 
y= g; CPCA)Y0) = (3 exp A) yo = (A exp(t A)) yo 
= A(exp(tA)yo) = Ay, 


and y(O) = exp(O) yo = Inyo = yo. Hence y is a solution of (17.10) that satisfies 
the initial condition. If w is another such solution and u := exp(—tA)w, then 


d 
u = T (exp(—tA)w) = —Aexp(—tA)w + exp(—tA)w 
= exp(—tA) (ù — Aw) = 0 € K”!, 
which shows that the function u has constant entries. In particular, we then have 


u = u(0) = w(0) = yo = y(0) and w = exp(tA) yo, where we have used that 
exp(—tA) = (exp(tA))~! (cp. Lemma 17.6). 
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(2) Each of the functions exp(tA)e;,..., exp(tA)e, : [0,a] > K"', j =1,...,n, 
solves the homogeneous system y = Ay. Since the matrix exp(tA) € K™” is 
invertible for every t € [0, a] (cp. Lemma 17.6), these functions are linearly 
independent. 


If y is an arbitrary solution of ý = Ay, then y(0) = yo for some yo € K™!. By 
(1) then y is the unique solution of the initial value problem with y(0) = yo, so 
that y = exp(tA) yo. As a consequence, y is a linear combination of the functions 
exp(tA)e,,..., exp(tA)eé,. o 


To describe the solution of the non-homogeneous system (17.9), we need the 
integral of functions of the form 


WwW 


w=| : ‘10. 4| > K”! 


For every fixed t € [0, a] we define 


fs wy1(s)ds 
/ w(s)ds := e K”!, 
0 t 
fo Wnls)ds 


1.e., we apply the integral entrywise to the function w. By this definition we have 


d i P a 
“(/ w(s) s)= wey 


for all t € [0, a]. We can now determine an explicit solution formula for systems of 
linear differential equations based on the so-called Duhamel integral.” 


Theorem 17.12 The unique solution of the non-homogeneous differential equation 
system (17.9) with the initial condition y(0) = yo € K™! is given by 
t 
y = exp(tA)yo + expa) | exp(—sA)g(s)ds. (17.11) 
0 
Proof The derivative of the function y defined in (17.11) is 


d d í 
= T (exp(tA) yo) + ey (eva) J exp(—s4)g(s)ds) 


= A exp(tA)yo + A expla) | exp(—sA)g(s)ds + exp(tA) exp(—tA)g 
0 


Jean-Marie Constant Duhamel (1797-1872). 
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t 
= Aexp(tA)yo + A expa) | exp(—sA)g(s)ds + g 
0 
= Ay+g. 


Furthermore, we have 


0 
y(O) = exp(O) yo + exp(0) f exp(—sA)g(s)ds = yo, 


so that y also satisfies the initial condition. 

Let now y be another solution of (17.9) that satisfies the initial condition. By 
Lemma 17.10 we then have y = y + w, where w solves the homogeneous system 
(17.10). Therefore, w = exp(tA)c for some c € K™! (cp. (2) in Theorem 17.11). 
For t = 0 we obtain yọ = yo + c, where c = 0 and hence y = y. Oo 


In the above theorems we have shown that for the explicit solution of systems of 
linear ordinary differential equations of first order, we have to compute the matrix 
exponential function. While we have introduced this function using the Jordan canon- 
ical form of the given matrix, numerical computations based on the Jordan canonical 
form are not advisable (cp. Example 16.20). Because of its significant practical rele- 
vance, numerous different algorithms for computing the matrix exponential function 
have been proposed. But, as shown in the article [MolV03], no existing algorithm is 
completely satisfactory. 


Example 17.13 The example from circuit simulation presented in Sect. 1.5 lead to 
the system of ordinary differential equations 


es, a ten, 
dt ; f° 
d,__ 
dt © C 


Using (17.11) and the initial values 7(0) = Z° and Vc(0) = V, we obtain the 


solution 

ae p -R/L —1/L Į” 

val | =17e 0 ve 

t 
B -R/L —1/L Vs(s) 
[leE] 

Example 17.14 Let us also consider an example from Mechanics. A weight with 
mass m > Q is attached to a spring with the spring constant u > 0. Let xọ > 0 be the 


distance of the weight from its equilibrium position, as illustrated in the following 
figure: 


268 
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8 
(æ) 


| 
NULU 


We want to determine the position x(t) of the weight at time t > 0, where x(0) = 
xo. The extension of the spring is described by Hooke’s law.’ The corresponding 
ordinary differential equation of second order is 


with initial conditions x(0) = xo and x(O) = vg, where vg > O is the initial velocity 
of the weight. We can write this differential equation of second order for x as a 
system of first order by introducing the velocity v as new variable. The velocity is 
given by the derivative of the position with respect to time, i.e., v = x, and thus for 
the acceleration we have v = x, which yields the system 


y= Ay, where A= | 4 and y= Hi 


The initial condition then is y(0) = yọ = [xo, vol’. 

By Theorem 17.11, the unique solution of this homogeneous initial value problem 
is given by the function y = exp(t A)yo. We consider A as an element of C*:*. The 
eigenvalues of A are the two complex (non-real) numbers A; = ip and A = —ip = 


Ai, where p := ./“. Corresponding eigenvectors are 
pP a p g eig 


and thus 


itp 
exp(tA)yo = S K EA Sy, S= h : |e C22. 


3Sir Robert Hooke (1635-1703). 
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Exercises 


17.1 
17.2 


I7 
17.4 
17.5 


17.6 


177 


17.8 


17.9 
17.10 
Ii 


17.12 


Construct a matrix A = [a;;] € C?? with A? Æ |a? |. 
j ij 


Determine all solutions X € C?? of the matrix equation X* = Jy, and classify 
which of these solutions are primary square roots of Jy. 

Determine a matrix X € C*? with real entries and X? = —h. 

Prove Lemma 17.3. 

Prove the following assertions for A € C™”: 


(a) det(exp(A)) = exp(trace(A)). 
(b) If A” = —A, then exp(A) is unitary. 
(c) If A* = J, then exp(A) = +(e + ‘I + +(e — 1)A. 


Let A = Sdiag(Jg,(A1), ..., Ja,(Am)) ST} € C™" with rank(A) = n. Deter- 
mine the primary matrix function f(A) for f(z) = z~!. Does this function 
also exist if rank(A) < n? 

Let log : {z = re’? |r > 0, -7 < y < tT} > C, re’? & In(r) + iy, be the 
principle branch of the complex logarithm (where In denotes the real natural 
logarithm). Show that this function is defined on the spectrum of 


= 01 2,2 
a=| 1o] Ec l 


and compute log(A) as well as exp(log(A)). 
Compute 


01 =f 1 aes 
exp E a exp (|| a) sinf | Oz 1 
O07 


Construct two matrices A, B € C** with exp(A + B) Æ exp(A) exp(B). 


Prove the assertion on the entries of Af in Example 17.7. 
Let 
511 
A= |051| eR”. 
004 


Compute exp(t A) fort € R and solve the homogeneous system of differential 
equations ý = Ay with the initial condition y(0) = [1, 1, 1]’. 

Compute the matrix exp(tA) from Example 17.14 explicitly and thus show 
that exp(tA) € R?? (for t € R), despite the fact that the eigenvalues and 
eigenvectors of A are not real. 


Chapter 18 
Special Classes of Endomorphisms 


In this chapter we discuss some classes of endomorphisms (or square matrices) 
whose eigenvalues and eigenvectors have special properties. Such properties only 
exist under further assumptions, and in this chapter our assumptions concern the 
relationship between the given endomorphism and its adjoint endomorphism. Thus, 
we focus on Euclidean or unitary vector spaces. This leads to the classes of nor- 
mal, orthogonal, unitary and selfadjoint endomorphisms. Each of these classes has 
a natural counterpart in the set of square (real or complex) matrices. 


18.1 Normal Endomorphisms 


We start with the definition of a normal! endomorphism or matrix. 


Definition 18.1 Let V be a finite dimensional Euclidean or unitary vector space. An 
endomorphism f € L(V, V) iscallednormalif fo f"! = f%4o f. Amatrix A € R™” 
or A € C™” is called normal if A’ A = AA’ or A“ A = AA", respectively. 


For all z € C we have Zz = |z|? = zz. The property of normality can therefore 
be interpreted as a generalization of this property of complex numbers. 

We will first study the properties of normal endomorphisms on a finite dimensional 
unitary vector space V. Recall the following results: 


(1) If Bis an orthonormal basis of V andif f € L(V, V), then ([f]g.3)" = [f““1e.8 
(cp. Theorem 13.12). 

(2) Every f € L(V, V) can be unitarily triangulated (cp. Corollary 14.20, Schur’s 
theorem). This does not hold in general in the Euclidean case, since not every 
real polynomial decomposes into linear factors over R. 


'This term was introduced by Otto Toeplitz (1881-1940) in 1918 in the context of bilinear forms. 
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Using these results we obtain the following characterization of normal endomor- 
phisms on a unitary vector space. 


Theorem 18.2 Jf VY is a finite dimensional unitary vector space, then f € L(V, V) 
is normal if and only if there exists an orthonormal basis B of V such that | f |g g is 
a diagonal matrix, i.e., f is unitarily diagonalizable. 


Proof Let f € L(V, V) be normal and let B be an orthonormal basis of Y such 
that R := [f]g.g is an upper triangular matrix. Then R” = [f““]p 2, and from 
fo f% = f% o f we obtain 


RR” =[f o flap =[f™ © flap = RUR. 


We now show by induction on n = dim(/V) that R is diagonal. This is obvious for 
n= |. 
Let the assertion hold for ann > 1, and let R € C"t!:"*! be upper triangular with 
RR" = R” R. We write R as 
_ | Rin 
reia 


where R; € C™” is upper triangular, r; € C”!, and a; € C. Then 


RiR” + rir” Qır _ i... plp — R” R; R” r 
| ary Jax |? a a a rf R rin + lal | 


From |a|? = r# rı + |ay|? we obtain r” rı = 0, hence r; = 0 and Ri R” = RË R}. 
By the induction hypothesis, Ry € C™” is diagonal, and therefore 


is diagonal as well. 

Conversely, suppose that there exists orthonormal basis B of V such that [f]z 
is diagonal. Then [f Ga) pe = (flp B)” is diagonal and, since diagonal matrices 
commute, we have 


[fo fls g = [fle Blf" le, g =(f“ eel flee =f © fle., 
which implies f o f°! = f% o f, and hence f is normal. oO 


The application of this theorem to the unitary vector space V = C”! with the 
standard scalar product and a matrix A € C”” viewed as element of L(V, V) yields 
the following “matrix version”. 


Corollary 18.3 A matrix A € C”” is normal if and only if there exists an orthonor- 
mal basis of C™! consisting of eigenvectors of A, i.e., A is unitarily diagonalizable. 
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The following theorem presents another characterization of normal endomor- 
phisms on a unitary vector space. 


Theorem 18.4 /f VY is a finite dimensional unitary vector space, then f € L(V, V) 
is normal if and only if there exists a polynomial p € C[t] with p(f) = f%. 


Proof If p(f) = f“ for a polynomial p € C[t], then 


Ta =op =p 67 =F o7. 


and hence f is normal. 
Conversely, if f is normal, then there exists an orthonormal basis B of V, such 
that [f]e.g = diag(\1,..., An). Furthermore, 


J les Ules) = diag(A1,..., An). 


Let p € C[t] be a polynomial with p(\;) = rj for j = 1,...,n. Such a polyno- 
mial can be explicitly constructed using the Lagrange basis of C[t]<,—; (cp. Exer- 
cise 10.12). Then 


Lf““Ip.p = diag(\i,..., An) = diag( p1), -.., POn)) = p(diagQz, -.., An) 
= p(l ]as)= [p(/) 12,2. 


and hence also f"! = p(f). o 


Several other characterizations of normal endomorphisms on a finite dimensional 
unitary vector space and of normal matrices A € C™” can be found in the arti- 
cle [HorJ12] (see also Exercise 18.8). 

We now consider the Euclidean case, where we focus on real square matrices. 
All the results can be formulated analogously for normal endomorphisms on a finite 
dimensional Euclidean vector space. 

Let A € R”” be normal, i.e., AT A = AA’. Then A also satisfies A” A = AA” 
and when A is considered as an element of C™”, it is unitarily diagonalizable, i.e., 
A = SDS" holds for a unitary matrix S € C”” and a diagonal matrix D € C™". 
Despite the fact that A has real entries, neither S nor D will be real in general, since 
A as an element of R™” may not be diagonalizable. For instance, 


_ 12 22 
a=| 37 [eR 


is anormal matrix that is not diagonalizable (over R). Considered as element of C??, 
it has the eigenvalues 1 + 2i and 1 — 2i and it is unitarily diagonalizable. 

To discuss the case of real normal matrices in more detail, we first prove a “real 
version” of Schur’s theorem. 
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Theorem 18.5 For every matrix A € R”” there exists an orthogonal matrix 


U e R”” with 
Ry iow Rim 
U” AU = R= of | e RY, 
Kym 
where for every j = 1,...,m either Rj; € R!! or 


rep oh 22 (J) 
F3 F4 


In the second case R ;; has, considered as complex matrix, a pair of complex conjugate 
eigenvalues of the form a; +iß; with a; € R and p; € R \ {0}. The matrix R is 
called a real Schur form of A. 


Proof We proceed via induction on n. For n = 1 we have A = [a] = R and 
U = |1]: 

Suppose that the assertion holds for some n > 1 and let A € R"*!-"*! be given. 
We consider A as an element of C”+!:”+1, Then A has an eigenvalue À = a +i8 € C, 
a, 3 € R, corresponding to the eigenvector v = x +iy € C”*!! x,y e RV! 
and we have Av = Av. Dividing this equation into its real and imaginary parts, we 
obtain the two real equations 


Ax =@ax — By and Ay = Ox + ay. (18.1) 


We have two cases: 

Case 1: © = 0. Then the two equations in (18.1) are Ax = ax and Ay = 
ay. Thus at least one of the real vectors x or y is an eigenvector corresponding 
to the real eigenvalue œ of A. Without loss of generality we assume that this is 


the vector x and that ||x||2 = 1. We extend x by the vectors w2, ...,wWwn4+1 to an 
orthonormal basis of R”*!:! with respect to the standard scalar product. The matrix 
Ui := [x, wo, ..., Wai] E€ R’*!""! then is orthogonal and satisfies 


fora matrix A, € IR””. By the induction hypothesis there exists an orthogonal matrix 
Uy € R™” such that Ry := Uj A,U> has the desired form. The matrix 


1 O 
UO = Uj hea 


is orthogonal and satisfies 
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T {1 0 T LO] Oe. 
uTAU =| 4 yr U; AU, Std OR =k. 


where R has the desired form. 

Case 2: 3 4 0. We first show that x, y are linearly independent. If x = 0, then 
using 3 Æ Oin the first equation in (18.1) implies that also y = 0. This is not possible, 
since the eigenvector v = x + iy must be nonzero. Thus, x Æ 0, and using 6 Æ O in 
the second equation in (18.1) implies that also y Æ 0. If x, y € R™! \ {0} are linearly 
dependent, then there exists a u € R \ {0} with x = uy. The two equations in (18.1) 
then can be written as 


Ax =(a—GByu)x and Ax = = + Qp)x, 


which implies that G(1 + u?) = 0. Since 1 + u? Æ 0 for all u € R, this implies 
B = 0, which contradicts the assumption that 8 4 0. Consequently, x, y are linearly 
independent. 

We can combine the two equations in (18.1) to the system 


At yl=teylf 38), 


where rank([x, y]) = 2. Applying the Gram-Schmidt method with respect to the 
standard scalar product of R”*!! to the matrix [x, y] € R”*!* yields 


ri, F 
[x, y] = [a1 q2] | 5 el = OR, 
with QO! O = Ih and R, € GL>(R). It then follows that 
= Q 2 Q = 
AQ = Aix, yR =[x, y] E = QR; ea 


The real matrix 


has, considered as element of C®?, the pair of complex conjugate eigenvalues a +i8 
with 8 Æ 0. In particular, the (2, 1)-entry of R2 is nonzero, since otherwise Ry would 
have two real eigenvalues. 

We again extend q1, q2 by vectors w3, ..., W„+1 to an orthonormal basis of R”+!! 
with respect to the standard scalar product. (Forn = 1 thelist w3, ..., Wn+1 is empty.) 
Then U; := [Q, w3,..., Wn41] E€ R"t!"*! is orthogonal and we have 


R2| x 
U AU = o [AQ, A[w3, enea wn+1]| = uf [QRo, Alw3 oe wWn+t]] ~ al 
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for a matrix A, € R”~!”~!. Analogously to the first case, an application of the 
induction hypothesis to this matrix yields the desired matrices R and U. o 


Theorem 18.5 implies the following result for real normal matrices. 


Corollary 18.6 A matrix A € R”” is normal if and only if there exists an orthogonal 
matrix U € R” with 
U? AU = diag(R},..., Rm), 


where, for every j = 1,...,m either R; € R}! or 


o Qj bj 232 . l 
R= | Geer with B; #0. 


In the second case the matrix R; has, considered as complex matrix, a pair of complex 
conjugate eigenvalues of the form a; ipj. 
Proof Exercise. Oo 


Example 18.7 The matrix 


1 0 
A= || 29/9 


*1 v2 


= R?? 


has, considered as a complex matrix, the eigenvalues 1,1, —1. It is therefore neither 
diagonalizable nor can it be triangulated over R. For the orthogonal matrix 


i 02 0 
as =A 0y |e Re 
V2 0 V2 


the transformed matrix 
010 


U'AU = | -1 00 
001 


is in real Schur form. 


18.2 Orthogonal and Unitary Endomorphisms 


In this section we extend the concept of orthogonal and unitary matrices to endo- 
morphisms. 
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Definition 18.8 Let V be a finite dimensional Euclidean or unitary vector space. 
An endomorphism f € L(V, V) is called orthogonal or unitary, respectively, if 


fi o f = Idy. 

If f"! o f = Idy, then f“ o f is bijective and hence f is injective (cp. Exer- 
cise 2.7). Corollary 10.11 implies that f is bijective. Hence f“ is the unique inverse 
of f, and we also have f o f?! = Idy (cp. our remarks following Definition 2.21). 


Note that an orthogonal or unitary endomorphism f is normal, and therefore all 
results from the previous section also apply to f. 


Lemma 18.9 Let V be a finite dimensional Euclidean or unitary vector space and 
let f € L(Y, V) be orthogonal or unitary, respectively. If B is an orthonormal basis 
of V, then | f |g g is an orthogonal or unitary matrix, respectively. 


Proof Let dim(V) = n. For every orthonormal basis B of Y we have 


I, = [Idy]z, 8 = Cf“ o fle.8 = Lf“ 1e,8lfle.8 = (flep) Uf]e,B, 


and thus [f]g.g is orthogonal or unitary, respectively. (In the Euclidean case 


(Flas) = (flra) ») o 


In the following theorem we show that an orthogonal or unitary endomorphism 
is characterized by the fact that it does not change the scalar product of arbitrary 
vectors. 


Lemma 18.10 Let V be a finite dimensional Euclidean or unitary vector space with 
the scalar product (-, -). Then f € L(V, V) is orthogonal or unitary, respectively, if 
and only if (f (v), f(w)) = (v, w) for all v, w E€ V. 


Proof If f is orthogonal or unitary and if v, w € VY, then 
(v, w) = (Idy (v), w) = ((f o f)(v), w) = (Fv), fw). 
On the other hand, suppose that (v, w) = (f (v), f(w)) for all v, w € V. Then 


0= (v, w) — (f 0), f(w)) = (v, w) — (v, (Ff o f)(w)} 
= (v, (Idy — f“ o f)(w)). 


Since the scalar product is non-degenerate and v can be chosen arbitrarily, we have 
(Idy — f"! o f)(w) = 0 for all w € V, and hence Idy = f% o f. o 


We have the following corollary (cp. Lemma 12.13). 


Corollary 18.11 Jf V is a finite dimensional Euclidean or unitary vector space with 
the scalar product (-,-}, f € L(V, V) is orthogonal or unitary, respectively, and 
I-I = ¢,-)!/? is the norm induced by the scalar product, then || f (v)|| = ||v|| for 
allv € V. 
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For the vector space V = C”:! with the standard scalar product and induced norm 
lvl = (v% v)! as well as a unitary matrix A € C””, we have ||Av||2 = ||v|l2 for 


all v € C”!. Thus, 
Av 
IAb= sup Roy 
vechio lvli 





(cp. (6) in Example 12.4). This holds analogously for orthogonal matrices A € R””. 
We now study the eigenvalues and eigenvectors of orthogonal and unitary endo- 
morphisms. 


Lemma 18.12 Let V be a finite dimensional Euclidean or unitary vector space and 
let f € L(V, V) be orthogonal or unitary, respectively. If A is an eigenvalue of f, 
then |A| = 1. 


Proof Let (-, -) be the scalar product on V. If f(v) = Av with v Æ 0, then 
(v, v) = (Idy (v), v) = ((f™ o f)(v), v) = (f0), f(v)) = (Av, Av) = IA? (v, v), 


and (v, v) Æ 0 implies that |A| = 1. o 


The statement of Lemma 18.12 holds, in particular, for unitary and orthogonal 
matrices. However, one should keep in mind that an orthogonal matrix (or an orthogo- 
nal endomorphism) may not have an eigenvalue. For example, the orthogonal matrix 


— 0—1 2,2 
a= pler 


has the characteristic polynomial P4 = t* + 1, which has no real roots. If considered 
as an element of C>?, the matrix A has the eigenvalues i and —i. 


Theorem 18.13 


(1) If A € C”” is unitary, then there exists a unitary matrix U € C™” with 
U” AU = diag(1,..., An) 


and |A;| = 1 forj=1,...,n. 
(2) If A € R”” is orthogonal, then there exists an orthogonal matrix U € R”” with 


UT AU = diag(Rı, eae, Kas 


where for every j = 1,...,m either Rj = [Aj] € R!! with peer OF 


R=] 9 | eR with s; #0 and cots, =l. 
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Proof 


(1) A unitary matrix A € C”” is normal and hence unitarily diagonalizable (cp. 
Corollary 18.3). By Lemma 18.12, all eigenvalues of A have absolute value 1. 
(2) An orthogonal matrix A is normal and hence by Corollary 18.6 there exists an 


orthogonal matrix U € IR”” with U TAU = diag(R,,..., Rm), where either 
Rj € R!! or 
R; = | “i “a ER~ 
— Pj Qj 


with O; Æ 0. In the first case then R; = [A;] with |A;| = 1 by Lemma 18.12. 
Since A and U are orthogonal, also UT AU is orthogonal, and hence every 
diagonal block R is orthogonal as well. From R? Rj = h we obtain a4 +; = 1, 
so that R; has the desired form. Oo 


We now study two important classes of orthogonal matrices. 


Example 18.14 Leti, j,n € N with 1 <i < j < n and leta € R. We define 


Rij (a) = 





The matrix R;; (a) = [rij] € R™” is equal to the identity matrix /,, except for its 
entries 


rii = cos(a), rj =— sin(a), rj; = sin(a), rj; = cos(a). 
For n = 2 we have the matrix 


Ree aie == Sd 


sin(a) cos(a) 
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which satisfies 


Raar R2 (a) = p sin(a) cos(a) sin(œ) cos(a) 


ee (a) + sin? (a) 0 | 


cos? (a) + sin? (a) 


cos(a@) sin(a) | e — sin(a) | 


= h = Rya) Rpa)". 


One easily sees that each of the matrices R;; (a) € R”” is orthogonal. The multipli- 
cation of a vector v € R”! with the matrix R; j(q@) results in a (counterclockwise) 
rotation of v by the angle a in the (i, j)-coordinate plane. In Numerical Mathe- 
matics, the matrices R;;(a@) are called Givens rotations? This is illustrated in the 
figure below for the vector v = [1.0, 0.75]’ € R>! and the matrices Ri2(7/2) and 
Ry2(4), which represent rotations by 90 and 120 degrees, respectively. 


Ri2(5)v 








Example 18.15 For u € R™! \ {0} we define the Householder matrix 
2 T n,n 
H (u) := l, — — uu € RA (18.2) 
ut u 


and for u = 0 we set H (0) := I. For every u € R™! then H (u) is an orthogonal 
matrix (cp. Exercise 12.17). The multiplication of a vector v € R”! with the matrix 
H (u) describes a reflection of v at the hyperplane 


(span{u})~ = {y € R™' |u" y =O}, 


1.e., the hyperplane of vectors that are orthogonal to u with respect to the standard 
scalar product. This is illustrated in the figure below for the vector v = [1.75, 0.5]’ € 
R>! and the Householder matrix 


which corresponds to u = [—1, 1]’ € R*!. 


2Wallace Givens (1910-1993), pioneer of Numerical Linear Algebra. 
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H(uyv —_ (span{u}) 





MATLAB-Minute. 

Let u = [5, 3, 1]! e R°*!. Apply the command norm(u) to compute 
the Euclidean norm of u and form the Householder matrix H=eye(3)- 
(2/ (yu? xu) ) *(uxu’). Check the orthogonality of H via the computation of 
norm(H’xH-eye(3)). Form the vector v=Hxu and compare the Euclidean 
norms of u and v. 


18.3 Selfadjoint Endomorphisms 


We have already studied selfadjoint endomorphisms f ona finite dimensional Euclid- 
ean or unitary vector space. The defining property for this class of endomorphisms 
is f = f°! (cp. Definition 13.13). 

Obviously, selfadjoint endomorphisms are normal and hence the results of 
Sect. 18.1 hold. We now strengthen some of these results. 


Lemma 18.16 For a finite dimensional Euclidean or unitary vector space V and 
f € L(V, V), the following statements are equivalent: 


(1) f is selfadjoint. 
(2) For every orthonormal basis B of V we have [ f ]g B = ([fle.p)". 
(3) There exists an orthonormal basis B of V with | f |B, B = (flg. g)”. 


(In the Euclidean case ([fle.p)" = ([fle.p)’.) 


Proof In Corollary 13.14 we have already shown that (1) implies (2), and obvi- 
ously (2) implies (3). If (3) holds, then [f]g.8 = ([fle.a)” = PaT. (cp. Theo- 
rem 13.12), and hence f = f“, so that (1) holds. Oo 


We have the following strong result on the diagonalizability of selfadjoint endo- 
morphisms in both the Euclidean and the unitary case. 


Theorem 18.17 Jf V is a finite dimensional Euclidean or unitary vector space and 
f € L(V, V) is selfadjoint, then there exists an orthonormal basis B of V such that 
[f]s.g is a real diagonal matrix. 


282 18 Special Classes of Endomorphisms 


Proof Consider first the unitary case. If f is selfadjoint, then f is normal and hence 
unitarily diagonalizable (cp. Theorem 18.2). Let B be an orthonormal basis of VY so 
that [ f ]g,g is a diagonal matrix. Then [f], 8 = [7 les = (flg, gB)” implies that 
the diagonal entries of [f ]g,g, which are the eigenvalues of f, are real. 

Let V be an n-dimensional Euclidean vector space. If B = {vi,..., Un} 1S an 
orthonormal basis of V, then [ f]z 3 1s symmetric and in particular Horma By Corol- 
lary 18.6, there exists an AE matrix U = [u;;] € R” with 


Uler = diag(R1,..., Rm), 


where for j =1,...,m either R; € R'! or 


aj bj 22 . 
R; = JJ | eR“ with 8; £0. 
' B d i 
Since U7 [ f\g.gU is symmetric, a2 x 2 block R; with B; # 0 cannot occur. Thus, 


U' [fg 5U is a real diagonal matrix. 
We define the basis B = {wy,..., Wn} of V by 


(W1,..., Wa) = (U1,..., Un)U. 


Then, by construction, U = [Idy], 5 and hence Uf = U =F = [dy] g.g- Therefore, 
U' Tf lg gU = [f]s,s. If (-,-) is the scalar product on V, then (vj, vj) = dij, 
i,j =1,...,n. With UTU = I, we get 


n n n 
(wi, wj) -(> Uki Uk > see) = > > UxiUe; (Ug, Ve) = > LEE =O: 
k=1 


k=1 €=1 
Hence B is an orthonormal basis of V. o 
This theorem has the following “matrix version”. 


Corollary 18.18 


(1) If A € IR”” is symmetric, then there exist an orthogonal matrix U € IR" anda 
diagonal matrix D € R*” with A = U DU!. 

(2) If A € C”” is Hermitian, then there exist a unitary matrix U € C”” and a 
diagonal matrix D € R*” with A= U DU”. 


The statement (1) in this corollary is known as the principal axes transformation. 
We will briefly discuss the background of this name from the theory of bilinear forms 
and their applications in geometry. A symmetric matrix A = [a;;] € R”” defines a 
symmetric bilinear form on R”™! via 
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Ba |: R>! x R! SR, (x,y) yT Ax = S > ajay; 
i=l j=l 


The map 
Ga: RR”! +R, xp Ba(x,x) =x!" Ax, 


is called the guadratic form associated with this symmetric bilinear form. 

Since A is symmetric, there exists an orthogonal matrix U = [u1, ..., un] such 
that UT AU = Disareal diagonal matrix. If B} = {e1,..., en}, then[Ga]z,xp, = A. 
The set By = {u,,...,u,} forms an orthonormal basis of R”! with respect to the 
standard scalar product, and [u;, ..., Un] = [e1,...,e,JU, hence U = [Idpn.i |p, 3,. 
For the change of bases from of Bı to B2 we obtain 


[Balb.xe = (Ddri]2,,8,) Lale xp, Udg.1]a,,2, =U? AU = D 


(cp. Theorem 11.14). Thus, the real diagonal matrix D represents the bilinear form 
Ba defined by A with respect to the basis Bo. 

The quadratic form q4 associated with (4 is also transformed to a simpler form 
by this change of bases, since analogously 


ga(x) =x? Ax =x’ UDUTx =y" Dy=> dy? =an0), y=| : |= UT x. 
i= Yn 
Thus, the quadratic form q4 is turned into a “sum of squares”, defined by the quadratic 
form qp. 

The principal axes transformation is given by the change of bases from the canon- 
ical basis of R™! to the basis given by the pairwise orthonormal eigenvectors of A in 
R”!. The n pairwise orthogonal subspaces span{u jij = 1,.-.,n, form the n prin- 
cipal axes. The geometric interpretation of this term is illustrated in the following 
example. 


Example 18.19 For the symmetric matrix 


we have 
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with the orthogonal matrix U = [u1, u2] € R>? and 


ui = pe U = E where 


we __itv?2 7 1 
Va +v2}+1 y +72}+1 


(The numbers here are rounded to the fourth significant digit.) With the associated 
quadratic form q4 (x) = 4x? + 2x1x2 + 2x3, we define the set 


= 0.9239, +s = (0.3827: 


E, = {x € R”! |qa(x) — 1 = 0}. 


As described above, the principal axes transformation consists in the transformation 
from the canonical coordinate system to a coordinate system given by an orthonormal 
basis of eigenvectors of A. If we carry out this transformation and replace q4 by the 
quadratic form qp, we get the set 


2 2 
y 4 
Ep = {y €R™ |qn(y)-1=0} = [Iv yal” eRe t+ 2-1-0}, 
l 2 


here 6 : 0.4760, 8 : 0.7941 
where = =Q , = =0. . 
i Fasa i l= 


This set forms the ellipse centered at the origin of the two dimensional cartesian 
coordinate system (spanned by the canonical basis vectors e;, e2) with axes of lengths 
6ı and (45, which is illustrated on the left part of the following figure: 











e 


2 
o Ej 
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is a Givens rotation that rotates the ellipse Ep counterclockwise by the angle 
cos’ !(c) = 0.3926 (approximately 22.5 degrees). Hence E4 is just a “rotated ver- 
sion” of Ep. The right part of the figure above shows the ellipse E4 in the cartesian 
coordinate system. The dashed lines indicate the respective spans of the vectors u, 
and u2, which are the eigenvectors of A and the principal axes of the ellipse E4. 


Let A € R”” be symmetric. For a given vector v € R”! anda scalar a € R, 
O(x)=x'Ax+tvix+ta, x eR"! 


is a quadratic function in n variables (the entries of the vector x). The set of zeros of 
this function, i.e., the set {x € R»! | Q(x) = 0}, is called a hypersurface of degree 
2 or a quadric. In Example 18.19 we have already seen quadrics in the case n = 2 
and with v = 0. We next give some further examples. 


Example 18.20 
(1) Letn = 3, A = h, v = [0, 0, O]’ and a = —1. The corresponding quadric 


{[x1, x2, x3)" eR! | a + x3 + x3 -l= 0} 


is the surface of the ball with radius 1 around the origin: 





10 


(2) Letn s2 A= be 


| v = [0, 2]’ and a = 0. The corresponding quadric 


TETE x)" € R>! | xp +2x) = 0} 


is a parabola: 


T2 Ly 
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100 
(3) Letn =3,A=1|]000],v = (0, 2, 0]’ anda = 0. The corresponding quadric 
000 
{[x1, x2, x3] € R>! | x? + 2x2 = 0} 


is a parabolic cylinder: 








Corollary 18.18 motivates the following definition. 


Definition 18.21 If A e R”” is symmetric or A € C”” is Hermitian with n+ 
positive, n_ negative and ng zero eigenvalues (counted with their corresponding 
multiplicities), then the triple (n+, n_, no) is called the inertia of A. 


Let us first consider, for simplicity, only the case of real symmetric matrices. 


Lemma 18.22 Jf A € R”” symmetric has the inertia (n_,n_,ng), then A and 
Sa = diagUn,,—In_, Ono) are congruent. 


Proof Let A € R”” be symmetric and let A = U AU? with an orthogonal matrix 
U e R”” and A = diag(\;,..., An) E€ R””. If A has the inertia (n4, n_, no), then 
we can assume without loss of generality that 


= as (Ans An Oho) 


0 


where the diagonal matrices A,, and A,_ contain the positive and negative eigen- 
values of A, respectively, and 0„, € R”®”°. We have A = AS4A, where 


Sa o= died)... =f, 0, ek. 
A := diag((A,,)"”, (An), Ing) € GL (R). 
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Here (diag(1,..., m))!/* = diag(./j, ..., ./~m) and thus 


A=UAU! =UAS,AU! =(UA)S,A(UA)’. o 


This result will be used in the proof of Sylvester’s law of inertia.’ 


Theorem 18.23 The inertia of a symmetric matrix A € R”” is invariant under 
congruence, i.e., for every matrix G € GL, (R) the matrices A and GT AG have the 
same inertia. 


Proof The assertion is trivial for A = 0. Let A Æ O have the inertia (n4, n—, no), 
then not both n, and n_ can be equal to zero. We assume without loss of generality 
that n} > 0. (If n} = 0, then the following argument can be applied for n_ > 0.) 

By Lemma 18.22 there exist G; € GL, (IR) and S4 = diag, , —In_, Ono) with 
A = G! S4Gı. Let G2 € GL,(R) be arbitrary and set B := G AG». Then B 
is symmetric and has an inertia (4, n—, ño). Therefore, B = G3 S,G3 for Sg = 
diag Uz, , —J%_, Om) and a matrix G3 € GL,(R). If we show that ny = n, and 
no = No, then also n- =n. 

We have 


Les = ay i — a 
A=(G; ) BG =(G; ) Gi SpG:G; =G] 37G, Ga := G3G_’, 


and G4 € GL,(R) implies that rank(A) = rank (Sg) = rank(B), hence no = no. 
We set 


=i 
Ga Sig so. Mis as Upc Wicas Wag) ond 
—] ~ a Ae = Po 
Oa = [Miess Ur Vig ra a Waa Wag le 
Let Vy = span {ui, «+s Un, t and Vo Spa Vig usta Vas Win cass Wag} OINCeNy > 


0, we have dim(V1) > 1. If x € Vi \ {0}, then 


n+ 
> T 
xX = ajuj = G] [@, Ona O, , 0] 
j=l 
for some a},..., Œn, € R that are not all zero. This implies 


n4 
xT Ax = > as > 0. 
j=l 


3James Joseph Sylvester (1814-1897) proved this result for quadratic forms in 1852. He also 
coined the name law of inertia which according to him is “expressing the fact of the existence of 
an invariable number inseparably attached to such [bilinear] forms”. 
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If, on the other hand, x € V2, then an analogous argument shows that xf Ax < 0. 
Hence VY; N V = {0}, and the dimension formula for subspaces (cp. Theorem 9.29) 
yields 


dim(V,) + dim) — dim (V; N V2) = dim(V, + V2) < dim(R”!) = n, 
— am — am — amam 


=N 4 =n—n =0 
+ 


and thus n} < n. If we repeat the same construction by interchanging the roles of 
n, andn,, then n, < n4. Thus, n} = n, and the proof is complete. Oo 


In the following result we transfer Lemma 18.22 and Theorem 18.23 to complex 
Hermitian matrices. 


Theorem 18.24 Let A € C”” be Hermitian with the inertia (n1,n_,no). Then 
there exists a matrix G € GL, (C) with 


A = G” diag(n,,In_, Ono) G. 


Moreover, for every matrix G € GL,(C) the matrices A and G” AG have the same 
inertia. 


Proof Exercise. o 
Finally, we discuss a special class of symmetric and Hermitian matrices. 


Definition 18.25 A real symmetric or complex Hermitian n x n matrix A is called 


(1) positive semidefinite, if v” Av > 0 for all v € R»! resp. v € C™!, 
(2) positive definite, if v” Av > 0 for all v € R™! \ {0} resp. v € C™! \ {0}. 


Ifin (1) or (2) the reverse inequality holds, then the corresponding matrices are called 
negative semidefinite or negative definite, respectively. 


For selfadjoint endomorphisms we define analogously: If V is a finite dimensional 
Euclidean or unitary vector space with the scalar product (-, -) and if f € L(V, V) is 
selfadjoint, then f is called positive semidefinite or positive definite, if ( f (v), v) > 0 
for all v € V resp. (f (v), v) > 0 for all v € VY \ {0}. 

The following theorem characterizes symmetric positive definite matrices; see 
Exercise 18.19 and Exercise 18.20 for the transfer of the results to positive semidef- 
inite matrices resp. positive definite endomorphisms. 


Theorem 18.26 If A € R”” is symmetric, then the following statements are equiv- 
alent: 


(1) A is positive definite. 


(2) All eigenvalues of A are real and positive. 
(3) There exists a lower triangular matrix L € GL, (R) with A = LL’. 
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Proof 


(1) 


(2) 


(3) 


(1) 


=> (2): The symmetric matrix A is diagonalizable with real eigenvalues (cp. 
(1) in Corollary 18.18). If A is an eigenvalue with associated eigenvector v, 1.e., 
Av = Xv, then Av’ v = vf Av > O and vf v > 0 implies that À > 0. 

= (1): Let A = U” diag(\),..., An) U be a diagonalization A with an orthog- 
onal matrix U e R”” (cp. (1) in Corollary 18.18) and A; > 0, j = 1,...,7. 
Let v € R»! \ {0} be arbitrary and let w := Uv. Then w Æ 0 and v = UT w, so 
that 


v? Av = (UT w)! UT diag(\),..., An) UCU! w) = wf diag(à1, ..., An) W 
= S Aju? >Q. 
j=l 


= (1): If A = LL" with L € GL,(R), then for every v € C”! \ {0} we have 
yp Apa LL us |L w =, 


since L’ is invertible. (Note that here we do not need that L is lower triangular.) 
= (3): Let A = U’ diag(\),...,A,) U be a diagonalization of A with an 
orthogonal matrix U e IR”” (cp. (1) in Corollary 18.18). Since A is positive 
definite, we know from (2) that A; > 0, j = 1,..., n. We set 


Al? := diag (1, oars ea) 


and then have A = (UA!/*)(A!*U") =: B’B. Let B = OR bea QR- 
decomposition of the invertible matrix B (cp. Corollary 12.12), where Q € IR””” 
is orthogonal and R € R™” is an invertible upper triangular matrix. Then A = 
BT B = (QR)! (QR) = LL’, where L := R’. o 


One easily sees that an analogous result holds for complex Hermitian matrices 


A € C™”. In this case in assertion (3) the lower triangular matrix is L € GL, (©) 
with A = LL”. 


The factorization A = LL’ in (3) is called a Cholesky factorization* of A. It 


is special case of the LU-decomposition in Theorem 5.4. In fact, Theorem 18.26 
shows that an LU-decomposition of a (real) symmetric positive definite matrix can 
be computed without row permutations. 


In order to compute the Cholesky factorization of the symmetric positive definite 


matrix A = [aj;] € R””, we consider the equation 


4 André-Louis Cholesky (1875-1918). 
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ly Lies tle 
ASTI S 
For the first row of A we obtain 
ai= => ln =v, (18.3) 
dij , 
aj = lula => n= j=2,...,n. (18.4) 
11 
Analogously, for the rows i = 2,..., of A we obtain 
i jl RET 
aii = Š hjlj => l; = (ai = >) l (18.5) 
j=l j=l 


n i b= 
aij = Š dick jx — > did jx E Š lind jx + liil ji 
k=l k=l k=l 
1 Pl 
=> l= (ai = 2 luin), for j >i. (18.6) 


The symmetric or Hermitian positive definite matrices are closely related to the 
positive definite bilinear forms on Euclidian or unitary vector spaces. 


Theorem 18.27 If V is a finite dimensional Euclidian or unitary vector space and 
if 3 is a symmetric or Hermitian bilinear form on V, respectively, then the following 
statements are equivalent: 


(1) ( is positive definite, i.e., B(v, v) > 0 forall v € V \ {0}. 

(2) For every basis B of V the matrix representation [6]gxpg is (symmetric or Her- 
mitian) positive definite. 

(3) There exists a basis B of V such that the matrix representation [8]gxpg is (sym- 
metric or Hermitian) positive definite. 


Proof Exercise. Oo 
Exercises 


18.1 Let A € R”” be normal. Show that aA for every a € R, A* for every k € No, 
and p(A) for every p € R[t] are normal. 

18.2 Let A, B € R”” be normal. Are A + B and AB then normal as well? 

18.3 Let A € R*? be normal but not symmetric. Show that then 


a= 3al 
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for some a € R and 8 € R \ {0}. 

18.4 Prove Corollary 18.6 using Theorem 18.5. 

18.5 Show that real skew-symmetric matrices (i. e., matrices with A = —A’ € 
R””) and complex skew-Hermitian matrices (i. e., matrices with A = —A” € 
C™”) are normal. 

18.6 Let V be a finite dimensional unitary vector space and let f € L(V, V) be 
normal. Show the following assertions: 


(a) If f = f*, then f is selfadjoint. 
(b) If f? = f’, then f = f°. 
(c) If f is nilpotent, then f = 0. 


18.7 Let V bea finite dimensional real or complex vector space and let f € L(V, V) 
be diagonalizable. Show that there exists a scalar product on VY such that f is 
normal with respect to this scalar products. 

18.8 Let A € C””. Show the following assertions: 


(a) A is normal if and only if there exists a normal matrix B with n distinct 
eigenvalues that commutes with A. 

(b) A is normal if and only if A + al is normal for every a € C. 

(c) Let H(A) := (A + A”) be the Hermitian and S(A) := 5(A — A”) the 
skew-Hermitian part of A. Show that A = H(A)+S(A), H(A)” = H(A) 
and S(A)” = —S(A). Show, furthermore, that A is normal if and only if 
H(A) and $(A) commute. 


18.9 Show that if A € C”” is normal and if f(z) = ate with ad — bc Æ 0 is 
defined on the spectrum of A, then f(A) = (aA + bI)(cA+dI)"!. 
(The map f(z) is called a Möbius transformation.” Such transformations play 
an important role in Function Theory and in many other areas of Mathematics.) 
18.10 Let V be a finite dimensional Euclidian or unitary vector space and let f € 
L(V, V) be orthogonal or unitary, respectively. Show that f—! exists and is 
again orthogonal or unitary, respectively. 
18.11 Letu € R”! and let the Householder matrix H (u) be defined as in (18.2). 


Show the following assertions: 





(a) For u Æ 0 the matrices H(u) and [—e), €2,..., e] are orthogonally 
similar, i.e., there exists an orthogonal matrix Q € R™” with 


QT H(u)O = [—e), &,..., en]. 


(This implies that H(u) only has the eigenvalues 1 and —1 with the 
algebraic multiplicities n — 1 and 1, respectively.) 

(b) Every orthogonal matrix A € IR”” can be written as product of n House- 
holder matrices, i.e., there exist u4, ..., Un € R»! with A = H (u1)... 
H (un). 


3 August Ferdinand Möbius (1790-1868). 
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Let v € R»! satisfy v’v = 1. Show that there exists an orthogonal matrix 
U e R”” with Uv = e}. 

Transfer the proofs of Lemma 18.22 and Theorem 18.23 to complex Hermitian 
matrices and thus show Theorem 18.24. 

Determine for the symmetric matrix 


a=" A 


an orthogonal matrix U € R*? such that U’ AU is diagonal. Is A positive 
(semi-)definite? 

Let K € {R, C} and let {v;,..., Vn} be a basis of K™!. Prove or disprove: A 
matrix A = A” € K"" is positive definite if and only if vř Av; > 0 for all 
bm ene 

Use Definition 18.25 to test whether the symmetric matrices 


il 12 Jj T 
ab bih [ra] © 


are positive (semi-)definite. Determine in all cases the inertia. 


Let 
Ai, Aj2 
A= ce R”” 
ar rl 


with A;; = Ait E€ GL, (R), Ap E€ R””” and An = Ay eR?" ""™ The 
matrix S := Az — ALA An € R™” is called the Schur complement® of 
A11 in A. Show that A is positive definite if A,;; and S are positive definite. 
(For the Schur complement, see also Exercise 4.17.) 

Show that A € C™” is Hermitian positive definite if and only if (x, y) = y” Ax 
defines a scalar product on C”!. 

Prove the following version of Theorem 18.26 for positive semidefinite matri- 
ces. 


If A € R”” is symmetric, then the following statements are equivalent: 


(1) A is positive semidefinite. 
(2) All eigenvalues of A are real and nonnegative. 
(3) There exists an upper triangular matrix L € R*” with A = LL’. 


Let Y be a finite dimensional Euclidian or unitary vector space and let f € 
L(V, V) be selfadjoint. Show that f is positive definite if and only if all 
eigenvalues of f are real and positive. 

Let A € R””. A matrix X € R”” with X? = A is called a square root of A 
(cp. Sect. 17.1). 


6Issai Schur (1875-1941). 
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(a) Show that a symmetric positive definite matrix A € R™” has a symmetric 
positive definite square root. 
(b) Show that the matrix 


33 6 6 
A=| 6 24-12 
6—12 24 


is Symmetric positive definite and compute a symmetric positive definite 
square root of A. 
(c) Show that the matrix A = J,,(0), n > 2, does not have a square root. 


18.22 Show that the matrix 
210 


A=|121] eR? 
012 


is positive definite and compute a Cholesky factorization of A using (18.3)— 
(18.6). 
18.23 Let A, B €e C”” be Hermitian and let B be furthermore positive definite. 
Show that the polynomial det(tB — A) € C[t]<, has exactly n real roots. 
18.24 Prove Theorem 18.27. 


Chapter 19 
The Singular Value Decomposition 


The matrix decomposition introduced in this chapter is very important in many 
practical applications, since it yields the best possible approximation (in a certain 
sense) of a given matrix by a matrix of low rank. A low rank approximation can be 
considered a “compression” of the data represented by the given matrix. We illustrate 
this below with an example from image processing. 

We first prove the existence of the decomposition. 


Theorem 19.1 Let A € C”” withn > m be given. Then there exist unitary matrices 
V eC" and W e C”™” such that 


ur Orm 


On—r,r On—r,m=r 


A=V=W" with = =| Jer 2; =] as (Oj. «sip; 


(19.1) 


where 0; > 02 > :--- > o, > O andr = rank(A). 


Proof If A = 0, then we set V = [,, © = 0 €e C””, X, =[], W = Im, and we are 
finished. 


Let A Æ 0 andr := rank(A). Since n > m, we have 1 < r < m, and since 
A” A e C™™" is Hermitian, there exists a unitary matrix W = [w),..., Wm] € C™” 
with 


W"(A" A)W = diag(\1,..., Am) € R™” 


(cp. (2) in Corollary 18.18). Without loss of generality we assume that A; > A2 > 
--- > Am. For every j =1,...,m then A” Aw; = A;w;, and hence 


H HAH 2 
Ajw} wj = w; A” Aw; = ||Aw;|lz = 9, 


i.e., à; > Ofor j = 1, ...,m. Then rank(A” A) = rank(A) = r (to see this, modify 
the proof of Lemma 10.25 for the complex case). Therefore, the matrix A” A has 
exactly r positive eigenvalues \,,..., A, and m —r times the eigenvalue 0. We then 
© Springer International Publishing Switzerland 2015 295 
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define oj := Lg = 1,...,r, and have c} > o >.--- > 0,;. Let È, be as in 
(19.1), 


Ka k = |e GLT A= aA 


VE [raas de OZ = iiaea Len 


V yV Z ve 7 E ads E I.0 
ay |= | ge [V,, Z]=X°X =D W° A" AWD =], ol 


which implies, in particular, that Z = 0 and V” V, = I,. We extend the vectors 
X1,...,X, to an orthonormal basis {x1,..., Xp, Xp41,-.+, Xn} of C”! with respect to 
the standard scalar product. Then the matrix 


V .= [V,-, Kis eee | € (Cn 


is unitary. From X = AWD! and X = [V,, Z] = [V,, 0] we finally obtain 
A=[V,, 0DW#Ë and A = VE W# with È as in (19.1). o 


As the proof shows, Theorem 19.1 can be formulated analogously for real matrices 
A e R”” with n > m. In this case the two matrices V and W are orthogonal. If 
n < m we can apply the theorem to A” (resp. A’ in the real case). 


Definition 19.2 A decomposition of the form (19.1) is called a singular value 
decomposition or short SVD! of the matrix A. The diagonal entries of the matrix 
X, are called singular values and the columns of V resp. W are called left resp. right 
singular vectors of A. 


From (19.1) we obtain the unitary diagonalizations of the matrices A” A and 
AA. 


The singular values of A are therefore uniquely determined as the positive square 
roots of the positive eigenvalues of A” A or AA”. The unitary matrices V and W in 
the singular value decomposition, however, are (as the eigenvectors in general) not 
uniquely determined. 


lIn the development of this decomposition from special cases in the middle of the 19th century to its 
current general form many important players of the history of Linear Algebra played a role. In the 
historical notes concerning the singular value decomposition in [HorJ91] one finds contributions 
of Jordan (1873), Sylvester (1889/1890) and Schmidt (1907). The current form was shown in 1939 
by Carl Henry Eckart (1902-1973) and Gale Young. 
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If we write the SVD of A in the form 


A=VEW*# = (v F Im |w”) (w E i |w”) =: UP, 


then U € C”” has orthonormal columns, i.e., UŽU = I,,, and P = PË e C™™ 
is positive semidefinite with the inertia (r, 0, m — r). The factorization A = UP is 
called a polar decomposition of A. It can be viewed as a generalization of the polar 
representation of complex numbers, z = et? |z]. 


Lemma 19.3 Suppose that the matrix A € C”” with rank(A) = r has an SVD 
of the form (19.1) with V = [vj,...,U,] and W = [wj,..., Wm]. Considering 
A as an element of L(C™!, C™!), we then have im(A) = span{v1,..., u,} and 
ker(A) = span{wW,41,..., Wm}. 


Proof For j =1,...,r we have Aw; = VEW” w; = Vie; = 0;v; Æ 0, since 


o; # 0. Hence these r linear independent vectors satisfy v1, ..., v, € im(A). Now 
r = rank(A) = dim(im(A)) implies that im(A) = span{v1,..., u;}. 

For j =r-+1,...,mwehave Aw; = 0, and hence these m —r linear independent 
vectors satisfy W,41,..., Wm E ker(A). Then dim(ker(A)) = m — dimGim(A)) = 
m — r implies that ker(A) = span{w,;+1,..., Wm}. a 


An SVD of the form (19.1) can be written as 
A= > Oj Uj we 
j=l 


Thus, A can be written as a sum of r matrices of the form oj; w7 , where 


rank (ojvjwt) = |, Let 
k 
Ax i= > auw forsomek, 1<k <r. (19.2) 
j=l 


Then rank(A;) = k and, using that the matrix 2-norm is unitarily invariant (cp. 
Exercise 19.1), we get 


[A — Agll2 = Ildiag(oxy1,..-, O)Il2 = On41. (19.3) 


Hence A is approximated by the matrix Ag, where the rank of the approximating 
matrix and the approximation error in the matrix 2-norm are explicitly known. The 
singular value decomposition, furthermore, yields the best possible approximation 
of A by a matrix of rank k with respect to the matrix 2-norm. 


Theorem 19.4 With Ax as in (19.2), we have ||A — Ax|l2 < ||A — Bll2 for every 
matrix B € C™” with rank(B) = k. 
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Proof The assertion is clear for k = rank(A), since then A; = A and ||A—Ag||2 = 0. 

Let k < rank(A) < m. Let B e C”” with rank(B) = k be given, then 
dim(ker(B)) = m — k, where we consider B as an element of L(C™!, C”!). If 
W1,..., Wm are the right singular vectors of A from (19.1), then U := span{w ,..., 
w1} has the dimension k + 1. Since ker(B) and U are subspaces of C”! with 
dim(ker(B)) + dim(Y/) = m + 1, we have ker(B) NU F {0}. 


Let v € ker(B) NU with ||v||2 = 1 be given. Then there exist a1,..., Ag+, E€ C 
with v = 4") ajw, and 5") |aj|? = lvl =1. Hence 
k+1 k+1 
(A = B)v = Áv — Bv = Š ajAw; = Š ajojvj 
=() j=l j=) 


and, therefore, 


k+1 
|A — Bliz = max ||(A — B)yll2 = (A — Bulle = | D_ ajo; 
yl2= = 
k+1 AW o 
— (> lajo;| ) (since v1, ..., Ug) are pairwise orthonormal) 
j=l 


k+l T 
> om ( 2 laP) (since o; = ++ = ory) 
j=1 


= 0x41 = ||A — Arh 


which completes the proof. Oo 


MATLAB-Minute. 

The command A=magic(n) generates forn > 3 ann x n matrix A with entries 
from 1 to n?, so that all row, column and diagonal sums of A are equal. The 
entries of A therefore from a “magic square”. 

Compute the SVD of A=magic(10) using the command [V,S,W]=svd(A). 
What can be said about the singular values of A and what is rank(A)? Form 
A; fork = 1,2,..., rank(A) as in (19.2) and verify numerically the equation 
(1973) 


The SVD is one of the most important and practical mathematical tools in almost 
all areas of science, engineering and social sciences, in medicine and even in psychol- 
ogy. Its great importance is due to the fact that the SVD allows to distinguish between 
“Important” and “non-important” information in a given data. In practice, the latter 
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corresponds, e.g., to measurement errors, noise in the transmission of data, or fine 
details in a signal or an image that do not play an important role. Often, the “impor- 
tant” information corresponds to the large singular values, and the “non-important”’ 
information to the small ones. 

In many applications one sees, furthermore, that the singular values of a given 
matrix decay rapidly, so that there exist only few large and many small singular 
values. If this is the case, then the matrix can be approximated well by a matrix with 
low rank, since already for a small k the approximation error || A — Ax||2 = o%4+1 1S 
small. A low rank approximation Ax, requires little storage capacity in the computer; 
only k scalars and 2k vectors have to be stored. This makes the SVD a powerful tool 
in all applications where data compression is of interest. 


Example 19.5 We illustrate the use of the SVD in image compression with a picture 
that we obtained from the research center MATHEON: Mathematics for Key Tech- 
nologies”. The greyscale picture is shown on the left of the figure below. It consists 
of 286 x 152 pixels, where each of the pixels is given by a value between 0 and 64. 
These values are stored in a real 286 x 152 matrix A which has (full) rank 152. 





We compute an SVD A = VIEW’ using the command [V,S,W]=svd(A) in MAT- 
LAB. The diagonal entries of the matrix S, 1.e., the singular values of A, are ordered 
decreasingly by MATLAB (as in Theorem 19.1). For k = 100, 20, 10 we now 
compute matrices A, with rank k as in (19.2) using the command Ak=V(: ,1:k)« 
S(1i:k,1:k)«W(:,1:k)’. These matrices represent approximations of the original 
picture based on the k largest singular values and the corresponding singular vectors. 
The three approximations are shown next to the original picture above. The quality 
of the approximation decreases with decreasing k, but even the approximation for 
k = 10 shows the essential features of the “MATHEON bear”. 


Another important application of the SVD arises in the solution of linear systems 
of equations. If A € C™™” has an SVD of the form (19.1), we define the matrix 


Zt 0 


Al := WEVE €C™", where Xt := | 0 0 


| ER- (19.4) 


2? We thank Falk Ebert for his help. The original bear can be seen in front of the Mathematics building 
of the TU Berlin. More information on MATHEON can be found at www.matheon.de. 
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One easily sees that 
I, 0 


TA = 
ata=w] 69 


| W“ eR”” 

Ifr =m = n, then A is invertible and the right hand side of the above equation is 
equal to the identity matrix /,,. In this case we have A’ = A~!. The matrix A‘ can 
therefore be viewed as a generalized inverse, that in the case of an invertible matrix 
A is equal to the inverse of A. 


Definition 19.6 The matrix At in (19.4) is called Moore-Penrose inverse? or pseudo- 
inverse of A. 


Let A € C”” and b € C™! be given. If the linear system of equations Ax = b has 
no solution, then we can try to find anx € C™! such that AX is “as close as possible” 
to b. Using the Moore-Penrose inverse we obtain the best possible approximation 
with respect to the Euclidean norm. 


Theorem 19.7 Let A € C”” withn > mand b e C™! be given. If A = VEW” is 
an SVD, and A‘ is as in (19.4), thenx = A'b satisfies 


lb — AFl < llb — Ayll2 forall ye Cr, 


and 
2 
‘ 


IA = (> 


j=l 


H 
v 


j 


< Ilyll 








for ally € C™! with ||b — AX||2 = Ilb — Ayll2. 
Proof Let y € C™! be given and let z = [£|,..., En]! := Wy. Then 


|b — Aylĝ = lb — VE Wyle = |V(V 2b — X23 = V2) — £z 


r n 
D vžb -ajé + > užb 
j=l 


j=r+!1 


> X Joof. (19.5) 


Equality holds if and only if €; = (ui b) Jø; for all j = 1,...,r. This is satisfied 
if z = W” y = X'V"D. The last equation holds if and only if 


y= WE VËb = A'b — Xx. 


3Eliakim Hastings Moore (1862—1932) and Sir Roger Penrose (1931-). 
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The vector x therefore attains the lower bound (19.5). 
The equation 


1/2 
r 2 / 


IA = (> 


j=l 


H 
vib 


J 








is easily checked. Every vector y € C™! that attains the lower bound (19.5) must 


have the form T 
ut b vb 
= W ELTE s Yr+1s ---» Ym 


O71 Or 





for some 4-41, ---5 Ym E C, which implies that ||y|}2 > ||X||2. Oo 


The minimization problem for the vector x can be written as 


Ib — Ax]|2 = min ||b — Ayll2. 
yeC™! 


If 
T1 1 
A=]: :]|eR"’ 
rE. 
for (pairwise distinct) 71, ..., Tm E R, then this minimization problem corresponds 


to the problem of linear regression and the least squares approximation in Exam- 
ple 12.16, that we have solved with the Q R-decomposition of A. If A = QR is this 
decomposition, then At = (A” A)~!A® (cp. Exercise 19.5) and we have 


At = (RQP OR) !R” Q” Z R AR” RO” = RO: 


Thus, the solution of the least-squares approximation in Example 12.16 is identical 
to the solution of the above minimization problem using the SVD of A. 


Exercises 


19.1 Show that the Frobenius norm and the matrix 2-norm are unitarily invariant, 
i.e., that || PAQ||- = ||Allr and ||PAQ||2 = ||All2 for all A € C”” and 
unitary matrices P € C™”, Q e C””. 

(Hint: For the Frobenius norm one can use that || A||} = trace(A¥ A).) 


19.2 Use the result of Exercise 19.1 to show that || A|| r = (oi +... + o2)” and 
|| A|]2 = c1, where 0; >--- > o, > 0 are the singular values of A € C””. 
19.3 Show that || Alh = ||A” ||2 and IAI = || AË All for all A € C””. 


19.4 Show that ||A ||? < |[All; |[Alloo for all A € C””. 
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19.5 


19.6 


19.7 
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Let A € C™” and let A‘ be the Moore-Penrose inverse of A. Show the fol- 
lowing assertions: 


(a) If rank(A) = m, then At = (A# A)T! AF. 
(b) The matrix X = At is the uniquely determined matrix that satisfies the 
following four matrix equations: 
(1) AXA =A, 
(2) XAX =X, 
(3) (AX)" = AX, 
(A) AY = XA, 


Let 
2 1 
A=|0 3|e«R*”, b= 2 | eR. 
l=} 


Compute the Moore-Penrose inverse of A and a vector x € R>! such that 


(a) |b — Afl < Ilb — Ayll2 for all y € R2!, and 
(b) [[Xll2 < lyll2 for all y € R>! with |b — Ayll2 = llb — AFll2. 


Prove the following theorem: 


Let A € C”” and B € C4” withm <n < £. Then A” A = B" B if and only 
if B = UA for a matrix U € C4” with U“U = 1. If A and B are real, then 
U can also be chosen to be real. 

(Hint: One direction is trivial. For the other direction consider the unitary 
diagonalization of A” A = B” B. This yields the matrix W in the SVD of A 
and of B. Show the assertion using these two decompositions. This theorem 
and its applications can be found in the article [HorO96].) 


Chapter 20 
The Kronecker Product and Linear Matrix 
Equations 


Many applications, in particular the stability analysis of differential equations, lead 
to linear matrix equations, such as AX + XB = C. Here the matrices A, B, C are 
given and the goal is to determine a matrix X that solves the equation (we will give 
a formal definition below). In the description of the solutions of such equations, 
the Kronecker product,! another product of matrices, is useful. In this chapter we 
develop the most important properties of this products and we study its application in 
the context of linear matrix equations. Many more results on this topic can be found 
in the books [HorJ91, LanT85]. 


Definition 20.1 If K is a field, A = [a;;] E€ K™” and B € K””, then 


ayı B oes dim B 
A®B:= [a; B] = . . ) 


An 1B oe, AnmB 


is called the Kronecker product of A and B. 


The Kronecker product is sometimes called the tensor product of matrices. This 
product defines a map from K™:” x K™” to K™™™” The definition can be extended 
to non-square matrices, but for simplicity we consider here only the case of square 
matrices. The following lemma presents the basic computational rules of the Kro- 
necker product. 


Lemma 20.2 For all square matrices A, B, C over K, the following computational 
rules hold: 


(1) A@(B@C)=(ASB) OC. 


‘Leopold Kronecker (1832-1891) is said to have used this product in his lectures in Berlin in the 
1880s. It was defined formally for the first time in 1858 by Johann Georg Zehfuss (1832-1901). 
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(2) (UWA)®@B=A®W (UB) = WA @ B) forall we K. 

(3) (A+ B)@C=(A@C)+(B 8 C), whenever A + B is defined. 

(4) A®(B+C)=(A®@B)+(A @C), whenever B + C is defined. 

(5) (A & B)! =A! ® B’, and therefore the Kronecker product of two symmetric 
matrices is symmetric. 


Proof Exercise. Oo 


In particular, in contrast to the standard matrix multiplication, the order of the 
factors in the Kronecker product does not change under transposition. The following 
result describes the matrix multiplication of two Kronecker products. 


Lemma 20.3 For A, C e K™” and B, D € K™” we have 
(A 8 B)(C ® D) = (AC) @ (BD). 


Hence, in particular, 


(1) A8 B = (A 8 n)(n 8 B) = (In 8 B)(A 8 h), 
(2) (A@B)!=A!@B"!, if A and B are invertible. 


Proof Since A & B = [a;; B] and C & D = le], the block Fij € K”” in the 
block matrix [F;;] = (A ® B)(C ® D) is given by 


m 


m m 
Fij = Š (aix B)(cxj D) = Š aicr BD = (> arcy) BD. 
k=1 k=1 


k=1 


For the block matrix [G;;] = (AC) ® (B D) with G;; € K™” we obtain 


m 
Gij = g;;BD, where §ij = > AikCkj > 
k=1 


which shows (A ® B)(C ® D) = (AC) 8 (BD). Now (1) and (2) easily follow from 
this equation. Oo 


In general the Kronecker product is non-commutative (cp. Exercise 20.2), but we 
have the following relationship between A @ B and B & A. 


Lemma 20.4 For A €e K”™” and B e K”” there exists a permutation matrix 
Pek™™" with 
P™(A@ B)P=B@A. 


Proof Exercise. oO 


For the computation of the determinant, trace and rank of a Kronecker product 
there exist simple formulas. 
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Theorem 20.5 For A e K™” and B € K”” the following rules hold: 


(1) det(A ® B) = (det A)” (det B)” = det(B & A). 
(2) trace(A ® B) = trace(A) trace(B) = trace(B &® A). 
(3) rank(A ® B) = rank(A) rank(B) = rank(B @ A). 


Proof (1) From (1) in Lemma 20.3 and the multiplication theorem for determinants 
(cp. Theorem 7.15) we get 


det(A @ B) = det ((A 8 In) (Im @ B)) = det(A & J) det, @ B). 


By Lemma 20.4 there exists a permutation matrix P with A8 I, = P(,&A)P". 
This implies that 


det(A @ I) = det (P (I, & A)P") = det(J, ® A) = (det A)”. 


Since det(/,, ® B) = (det B)”, it then follows that det(A @ B) = (det A)” 
(det B)”, and therefore also det(A ® B) = det (B ® A). 
(2) From (A & B) = [a;; B] we obtain 


m n 


trace(A ® B) = 5 Sai = (Zas) ( > >i) = trace(A) trace(B) 


i=l j=l i=1 j=l 
= trace(B) trace(A) = trace(B ® A). 


~ 


(3) Exercise. Oo 
For a matrix A = [a, ..., an] E€ K™” with columns a; € K™!, j = 1,...,n, 
we define 
dı 
dz 1 
vec(A):= | . | e K™. 
dn 


The application of vec turns the matrix A into a “column vector” and thus “vectorizes” 
A. 


Lemma 20.6 The map vec : K”” —> K™"! is an isomorphism. In particular, 
A1, ..., Ag E K™” are linearly independent if and only if vec(A,), ..., vec(Ax) € 
K™"! are linearly independent. 


Proof Exercise. m 


We now consider the relationship between the Kronecker product and the vec 
map. 
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Theorem 20.7 For A e K™™, B e K™” and C € K™” we have 
vec(AC B) = (B? @ A)vec(C). 


Hence, in particular, 


(1) vec(AC) = (J, & A)vec(C) and vec(C B) = (B? @ Im)vec(C), 
(2) vec(AC + CB) = ((In @ A) + (B7 @ Im)) vec(C). 


Proof For j =1,...,n, the jth column of ACB is given by 


(ACB)e; = (AC)(Bej) = X byj(AC)ex = X (bij A)(Cex) 
k=1 k=1 


= [bi;A, Pori; i ie b,j; A |vec(C), 


which implies that vec(ACB) = (B? @ A)vec(C). With B = J, resp. A = Im we 
obtain (1), while (1) and the linearity of vec yield (2). Oo 


In order to study the relationship between the eigenvalues of the matrices A, B and 
those of the Kronecker product A® B, we use bivariate polynomials, i.e., polynomials 
in two variables (cp. Exercise 9.10). If 


l 


p(t, t2) = 2 aijtts € Kit, t] 
i, j=0 


is such a polynomial, then for A € K”’” and B € K™” we define the matrix 


l 
p(A, B):= >) ajA @ Bi. (20.1) 
i, j=0 


Here we have to be careful with the order of the factors, since in general A’ @ B/ Æ 
BÍ @ A (cp. Exercise 20.2). 


Example 20.8 For A € R™”, B € R”” and p(t, h) = 21 +3tt? = 2t/t)+3t/ t € 
R[t,, t2] we get the matrix p(A, B) = 2A Q I, + 3A @ B’. 


The following result is known as Stephanos’ theorem.’ 


*Named after Cyparissos Stephanos (1857-1917) who in 1900 showed besides this result also the 
assertion of Lemma 20.3. 
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Theorem 20.9 Let A €e K™” and B € K”” be two matrices that have Jordan nor- 
mal forms and the eigenvalues Ai, ..., Am E€ K and p,..., Un E€ K, respectively. 
If p(A, B) is defined as in (20.1), then the following assertions hold: 


(1) The eigenvalues of p(A, B) are p(\x, Wc) fork =1,...,mand£=1,...,n. 
(2) The eigenvalues of A ® B are Ag - ue fork = 1,...,m and £ = 1,...,n. 
(3) The eigenvalues of A&ln+ImQ B are Ak+ue fork = 1,...,mand£ = 1,...,n. 


Proof Let S € GL,,(K) and T € GL,(K) be such that SLAS = J, and TBT = 
Jg are in Jordan canonical form. The matrices J4 and Jg are upper triangular. Thus, 
foralli, j € No the matrices J}, J} and Ji & Jj are upper triangular. The eigenvalues 
of Ji and J} are X,..., AŻ, and už, ..., uh, respectively. Thus, p(Ax, ue), k = 
l,...,m, & = 1,...,n, are the diagonal entries of the matrix p(J,4, Jg). Using 
Lemma 20.3 we obtain 


l l 
p(A, B) = >) ay (SJaS D @(TIgT'Y! = $, aij (SIAS“') 8 (TIZT") 
i, j=0 i, j=0 
i ; 
= Ð ay (SJD 8 (TIP) 6 8 T7» 
i, j=0 
l š 
= > a(S 8 T)(J4 8 Ja(S 8 T)™' 
i, j=0 
j š 
=(S@T)[ >. wa LSI EST 
i,j=0 
=(S@T)p(Ja, JES 8 T), 


which implies (1). 
The assertions (2) and (3) follow from (1) with p(t), t2) = tt and p(t, t2) = 
ti + fo, respectively. m 


The following result on the matrix exponential function of a Kronecker product 
is helpful in applications that involve systems of linear differential equations. 


Lemma 20.10 For A € C””, B e C”” and C := (A ® La) + (Imn ® B) we have 
exp(C) = exp(A) & exp(B). 


Proof From Lemma 20.3 we know that the matrices A ® J, and I„ ® B commute. 
Using Lemma 17.6 we obtain 


308 20 The Kronecker Product and Linear Matrix Equations 


exp(C) = exp(A @ In + In @ B) = exp(A @ In) expUn, 8 B) 


= YG Aen (Ziem) 


— 1 
TÈ A W Un ® B)' 


CO 
1 , , 
72, (A! 8B’) 
i=0 
= exp(A) & exp(B), 

where we have used the properties of the matrix exponential series 
(cp. Sect. 17.1). o 

For given matrices A; € KY", B; e K””, j = 1,...,q, and C € K™” an 
equation of the form 


AIX Bi FAIA Bi Fer t AgX Bg = C (20.2) 


is called a linear matrix equation for the unknown matrix X € K™”. 


Theorem 20.11 A matrix X € K™” solves (20.2) if and only if x := vec(X) € 
K™"! solves the linear system of equations 


q 
Gx =vec(C), where G:= > B 8 Aj. 
j=l 
Proof Exercise. Oo 
We now consider two special cases of (20.2). 


Theorem 20.12 For A € C™”, B € C™” and C € C™” the Sylvester equation? 
AX+XB=C (20.3) 


has a unique solution if and only if A and —B have no common eigenvalue. If all 
eigenvalues of A and B have negative real parts, then the unique solution of (20.3) 
is given by 


CO 


£=- | exp(tA)C exp(tB)dr. 
0 


(As in Sect. 17.2 the integral is defined entrywise. ) 


3James Joseph Sylvester (1814-1897). 
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Proof Analogous to the representation in Theorem 20.11, we can write the Sylvester 
equation (20.3) as 
(In @ A+ B’ @In)x = vec(C). 


If A and B have the eigenvalues A;,..., Am and u1, ..., Un, respectively, then G = 
IQ A+B" & In by (3) in Theorem 20.9 has the eigenvalues A; + ue, k = 1,...,m, 
€=1,...,n. Thus, G is invertible, and the Sylvester equation is uniquely solvable, 
if and only if A, + we Æ Oforallk =1,...,mand@=1,...,n. 

Let A and B be matrices with eigenvalues that have negative real parts. Then A and 
— B have no common eigenvalues and (20.3) has a unique solution. Let J4 = S~'AS 
and Jp = T~!BT be Jordan canonical forms of A and B. We consider the linear 


differential equation dZ 
a AZ+ ZB, ZO=C, (20.4) 


that is solved by the function 
Z : [0, œ) — C™”, Z(t) := exp(tA)C exp(tB) 
(cp. Exercise 20.10). This function satisfies 


lim Zt) = lim exp(tA)C exp(t B) 
= lim S exp(tJa) SCT exp(tJg) T! =0. 


constant 


>00 —>0 


Integration of equation (20.4) from t = 0 to t = œ yields 


(00) CoO 


=C S70} = lim (Z(t) — Z(Q)) =A | Z0di + | zoar B. 
0 0 


(Here we use without proof the existence of the infinite integrals.) This implies that 


X:=- J Z(t)dt = — / exp(tA)C exp(tB)dt 
0 0 
is the unique solution of (20.3). Oo 


Theorem 20.12 also gives the solution of another important matrix equation. 


Corollary 20.13 For A, C € C™” the Lyapunov equation* 


AX + XA” = -C (20.5) 


4 Alexandr Mikhailovich Lyapunov (also Ljapunov or Liapunov; 1857-1918). 
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has a unique solution X € C”” if the eigenvalues of A have negative real parts. 
If, furthermore, C is Hermitian positive definite, then also X is Hermitian positive 
definite. 


Proof Since by assumption A and —A” have no common eigenvalues, the unique 
solvability of (20.5) follows from Theorem 20.12, and the solution is given by the 
matrix 


r= - J exp(tay-c) exp (tA”)dt = [ apea exp (tA”) dt. 
0 0 


If C is Hermitian positive definite, then X is Hermitian and for x € C”! \ {0} we 
have 


OO OO 


xXx =x” [ atac exp (tA”)dt | x = [x exp(tA)C exp (tA”)x dt > 0. 
——————— 
0 0 >0 


The last inequality follows from the monotonicity of the integral and the fact that for 
x Æ 0 also exp(tA”)x 4 0, since exp (tA”) is invertible for every real t. Oo 


Exercises 


20.1 Prove Lemma 20.2. 

20.2 Construct two square matrices A, B with A & B Æ B QA. 

20.3 Prove Lemma 20.4. 

20.4 Prove Theorem 20.5 (3). 

20.5 Prove Lemma 20.6. 

20.6 Show that A ® B is normal if A e C”” and B e C™” are normal. Is it true 
that if A ® B is unitary, then A and B are unitary? 

20.7 Use the singular value decompositions of A = V4&£aW# e C™” and B = 
VpupWz e C™” to derive the singular value decomposition of A ® B. 

20.8 Show that for A e C™” and B e C™” and the matrix 2-norm, the equation 
[A ® Bllz = ||All2||Bll2 holds. 

20.9 Prove Theorem 20.11. 

20.10 Let A € C””, B e C”” and C e C””. Show that Z(t) = exp(tA)C exp(t B) 

is the solution of the matrix differential equation dZ = AZ + ZB with the 
initial condition Z (0) = C. 


Appendix A 
A Short Introduction to MATLAB 


MATLAB! is an interactive software system for numerical computations, simulations 
and visualizations. It contains a large number of predefined functions and allows users 
to implement their programs in so-called m-files. 

The name MATLAB originates from MATrix LABoratory, which indicates the 
matrix orientation of the software. Indeed, matrices are the major objects in MAT- 
LAB. Due to the simple and intuitive use of matrices, we consider MATLAB well 
suited for teaching in the field of Linear Algebra. 

In this short introduction we explain the most important ways to enter and operate 
with matrices in MATLAB. One can learn the essential matrix operations as well as 
important algorithms and concepts in the context of matrices (and Linear Algebra 
in general) by actively using the MATLAB-Minutes in this book. These only use 
predefined functions. 

A matrix in MATLAB can be entered in form of a list of entries enclosed by 
square brackets. The entries in the list are ordered by rows in the natural order of the 
indices, 1.e., from “top to bottom” and “left to right’). A new row starts after every 
semicolon. For example, the matrix 


123 
A=1|456]| isenteredin MATLAB by typing A=[1 2 3;4 5 6;7 8 9]; 
789 


A semicolon after the matrix A suppresses the output in MATLAB. If it is omitted 
then MATLAB writes out all the entered or computed quantities. For example, after 
entering 


A=([1 2 3;4 5 6;7 8 9] 


'MATLAB® is a registered trademark of The MathWorks Inc. 
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MATLAB gives the output 
I 2 -3 
4 5 6 
/ 6 9 


One can access parts of matrices by the corresponding indices. The list of indices 
from k to m is abbreviated by 
Kim. 


A colon : means all rows for given column indices, or all columns for given row 
indices. If A is as above, then for example 


A(2,1) is the matrix [4], 
A(3,1:2) is the matrix [7 8], 


2 3 
AC: 2:3) is the matrix f 5| . 
8 9 


There are several predefined functions that produce matrices. In particular, for 
given positive integers n and m, 


eye (n) the identity matrix [,,, 
zeros(n,m) ann x m matrix with all zeros, 
ones (n,m) ann x m matrix with all ones, 
rand(n,m) ann x m “random matrix”. 


Several matrices (of appropriate sizes) be combined to a new matrix. For example, 
the commands 


A=eye(2); B=[4;3]; C=[2 -1]; D=[-5]; E=[A B;C D] 


lead to 
E = 
1 0 4 
O 1 3 
2 =1 =5 


The help function in MATLAB is started with the command hel1p. In order to get 
information about specific functions one adds the name of the function. For example: 
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Input: 
help 


help 
help 
help 
help 


Information on: 
ops operations and operators in MATLAB 

(in particular addition, multiplication, transposition) 
matfun MATLAB functions that operate with matrices 
gallery collection of example matrices 
det determinant 
expm matrix exponential function 
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Selected Historical Works on Linear Algebra 


(We describe the content of these works using modern terms.) 


e A. L. CAUCHY, Sur l’équation a laide de laquelle on détermine les inégalités séculaires des 
mouvements des planétes, Exercises de Mathématiques, 4 (1829). 

Proves that real symmetric matrices have real eigenvalues. 

e H. GRASSMANN, Die lineale Ausdehnungslehre, ein neuer Zweig der Mathematik, Otto Wiegand, 
Leipzig, 1844. 

Contains the first development of abstract vector spaces and linear independence, including the 
dimension formula for subspaces. 

e J. J. SYLVESTER, Additions to the articles in the September Number of this Journal, “On a new 
Class of Theorems,” and on Pascal’s Theorem, Philosophical Magazine, 37 (1850), pp. 363-370. 
Introduces the terms matrix and minor. 

e J. J. SYLVESTER, A demonstration of the theorem that every homogeneous quadratic polynomial 
is reducible by real orthogonal substitutions to the form of a sum of positive and negative squares, 
Philosophical Magazine, 4 (1852), pp. 138-142. 

Proof of Sylvester’s law of inertia. 

e A. CAYLEY, A memoir on the theory of matrices, Proc. Royal Soc. of London, 148 (1858), 
pp. 17-37. 

First presentation of matrices as independent algebraic objects, including the basic matrix oper- 
ations, the Cayley-Hamilton theorem (without a general proof) and the idea of a matrix square 
root. 

e K. WEIERSTRASS, Zur Theorie der bilinearen und quadratischen Formen, Monatsber. Konig]. 
Preußischen Akad. Wiss. Berlin, (1868), pp. 311-338. 

Proof of the Weierstrass normal form, which implies the Jordan normal form. 

e C. JORDAN, Traité des substitutions et des équations algébriques, Paris, 1870. 
Contains the proof of the Jordan normal form independent of Weierstrass’ work. 

e G. FROBENIUS, Ueber lineare Substitutionen und bilineare Formen, J. reine angew. Math., 84 
(1878), pp. 1-63. 

Contains the concept of the minimal polynomial, the (arguably) first complete proof of the 
Cayley-Hamilton theorem, and results on equivalence, similarity and congruence of matrices (or 
bilinear forms). 

e G. PEANO, Calcolo Geometrico secondo l’Ausdehnungslehre di H. Grassmann preceduto dalle 
operazioni della logica deduttiva, Fratelli Bocca, Torino, 1888. 

Contains the first axiomatic definition of vector spaces, which Peano called “sistemi lineari”, and 
studies properties of linear maps, including the (matrix) exponential function and the solution of 
differential equation systems. 
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e I. Scuur, Uber die charakteristischen Wurzeln einer linearen Substitution mit einer Anwendung 
auf die Theorie der Integralgleichungen, Math. Annalen, 66 (1909), pp. 488-5 10. 
Proof of the Schur form of complex matrices. 

e O. TOEPLITZ, Das algebraische Analogon zu einem Satze von Fejér, Math. Zeitschrift, 2 (1918), 
pp. 187-197. 
Introduces the concept of a normal bilinear form and proves the equivalence of normality and 
unitary diagonalizability. 

e F. D. MURNAGHAN AND A. WINTNER, A canonical form for real matrices under orthogonal 
transformations, Proc. Natl. Acad. Sci. U.S.A., 17 (1931), pp. 417-420. 
Proof of the real Schur form. 

e C. ECKART AND G. YOUNG, A principal axis transformation for non-hermitian matrices, Bull. 
Amer. Math. Soc., 45 (1939), pp. 118-121. 
Proof of the modern form of the singular value decomposition of a general complex matrix. 
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